< 



DOCOHBIT RESUHE 



ED 096 828 

AOTHOR 
TITLE 

IHSTITOTION 

BEPORT NO 
PUB DATE 
HOTE ^ 

EDRS PRICE 
DESCRIPTORS 



FL 6?6 620 



IDENTIFIERS 



Chafe, Wallace L. 

An Approach to Verbalization and Translation b; 
Machine. Final Report. 

California. Oniv. , Berkeley. Dept. of Linguistics.; 
Roie Air Development Center, Griffiss AFB« N.X* 
RADC-TR-7U-271 
Oct 74 
•122p. ' 

MF-^0.-75 HC-$5.40 PLOS /OSTAGB f ^ ' 
♦Artificial Intelligence; Cognitive Processes: 
♦Coaputational Linguistics; ♦CoiputeKProgrdis; 
Concept "Formation; English; *Infor«atioB^Processing; 
Japanese; Lexicology; ♦Machine Translatioh^.^Models; 
Syntax 

♦Verbalization 



ABSTRACT 

The report docuBents perforaance on a 24-Bonth RSD 
effort oriented toward the developnent of a computerized aodel for 
ftachine translation of natural language^ The aodel is built around a 
set of procedures called verbalization « intended to stimulate the 
processes ^aployed by a speaker or w];iter in turning stored 
inforaation into words. Verbalization is. seen to consist, of 
sdbconceptualization and lexicalizatibn processes whic^involve 
creative choices on the part of the verbalizer* together with 
algorithaic syntactic processes deterai^ed by the language being 
used. Translation is viewed ts (1) the Reconstruction of the 
verbalization processes which went into the original Source language 
text and (2) the application of parallel verbalization processes in 
the target language. The target language verbalization looks for 
creative choices to the source language verbalization and tries to 
~apply corresponding chdices siaultaneously with application of 
syntactic processes dictated by the grammar of. the target language. 
Verbalization and translation procesises are illustrated in some 
detail with examples taken from English and Japanese. Some of these 
processes have been implemented in 4n interactive program on CDC 6600 
at the\,awrence Berkeley Laboratory (AEC) , but the main intent of the 
report rs to demonstrate the kinds of processes that need to be 
incorporated in such a system., (Author) 



ERIC 



RADC-TR-74-P.71 

Final Report 
October. 1974 




\1 ^^Um 



AN APPROACH TO VERBALIZATION AND TRANSLATION &Y MACHINE 
The University of California at Berkeley 



Approyed for public release; 
distribution /aitHjnited « 




o s oepabtmentof health 

EDUCATION 4 MEL ARE 
NATtONAL INSTITUTE OF 
EDUCATION 

• . D" Vf N' KM N 

>{ S • O^'l C U, S/. ' ■ ' S A. N . ' ' . ' f 0» 



Rome Air Development Center 
Air Force Systems Command 
Griffiss Air Force Base, New York 



Po^reword 



This Filial Technical Report was prepared ""i;:^^?^^^ 
of California at Berkeley, Dfe^artment of Linguistics, Berkeley,. 
?ali?orn?a inder Contract F30B02.72.C-0406/ ^ob Order Number . 
45940805 for Rome Air ^ Development Cente^. Griffiss Air Force 
Base, New -York. The work .performejd covered the period from 
1 June 1972 through, 3 i May 1974. 

Zbigniew L.' Pankowicz (IRDT) was RADC P^^j^^Engineer.. . 

This report was written by Wallace L. Ch^fe, with the 
collaboration of other members of the COntrastive Semantics 
Proiect. Associated with the proDect during its entire life 
were plirici/M. Clancy! Leonard M. Faltz , Christopher Murano, 
rasmig Seropian. Also Active ,^uring more than half of this _ 4 
pe?Tod were'^Masayoshi Shibatani and j 

•during shorter periods of time were Teres^ M. Chen, Charles J. 
Fillmore, Roberr E. Gaskins, and Marie-Claude Jorland. 

. Masayoshi Hirose served as a consultant on .Japanese during the 
last two months of the project. 

This 'report has been reviewed by the Office of Information 
(01) , R^DCrand approved for release ^to the National Technical 
'Information Service j (NTIS) . • . 

This report has been reviewed and is approved. 



APPROVED: 



ZBIGIIIEW L. PANKOWICZ 
Project Engineer 



<.... 



APPROVED: 




HOWARD DAVIS 

Technical Director • • -a 

Intelligence & Reconnaissance Division 



i 



\ 



FOR THE COMMANDER: 



11 



JAMES G. McGINNIS 
Lt Col, USAF 

Deputy Chief, Plans Office 



Table be Contertts, 



/ / 



tract 

I . Overview 



. II. 



V. 



VI. 



Subcoiiceptualizatloh 



4 III. ' An Example j 
IV. Lexicalization of/a CC 

/ 

Lexicalization o:^ a PI 
The Lexicon / 



VII. Discourse Inforination and Readjustments 

VIII. Translation / 

/ 

IX. Miscellaneous /Problems in Translation 
X.- Future Work 



Footnotes 



/ 
/ 



Page 

■ 1 
2 , 

10 

23 

31 

43 

50 

62 

70 

95 
107 
113 



\ 



iii 



5 



ERIC 



tVAlUATl-O 



r 



iiie report documents results o.f a 24;r.ionth i research effort 
that was directed at d^sinninn a cempy teri ied i.iodel for , ^ , 
iffschine traiisl'-tVon of natural languacjes. .Uhe model is 
coiSeptually based on sip.ulation of m-erital activities , 
Involved in verbalization (conversion of stored knowledge 
irto 1 incuif. tic patterns of a spi/rce lahcuaoc) and "transl atiop 
(reconstruction of sourcfe lanouacje verbal izatioi^ and applica- 
tion of parallel verbalization i*n a tr^^cjut language). The 
target languatjC verbalization consists in selection of 
equivalents to the source language verbal i 2 ation , con.bined 
with application of syntactic conventions required by the 
granmiar of the target lan<iuage. 

Tiie effort documented in this report is 
of "^esearch on sen-antic and ppst-senian, 
Oi i over tne past several years by Or. 
described Iri nis book, "lleani n(j ano ti 
'.. -diversity of Chicago Tress, lb;70. f;The 



a (''rect continuation 
rocesses, carried 
ace L. Chafe and • 
t^^ucture of Language*' 
sefiiiantic con.ponent i^' 




postulated 
1 ani'u aqc . 
Che juodern 



in Dr. Chafe's' work os tht basts of the theory of 
Tills position constitutes' a ra/liCal .ieparture from 
structuralist and transformati/onal-i i : trends 
l.irgely concernod with tnc syntactic con:i/oiiorit . Since 
translation is trajdi ti ona 1 ly defined as a transfer of ii.uaninij 
^rofM tiie 1 1 ngul stic'^attern of a source /language into that of 
a tarnef lanrutlcie, 'aacfijne translation RRD has to account 
for tne semantic 'Joii.pone>Kt. in order to supply the def i c'if. nci es 
of second gLne.^ation fiT i..odcls based on/ lexical and syntactic 
asptcts of natural languages : / 



Zaic.[;iEi: 

Techn 1 cal 



. I . 1 I ^^^^ 

1 . PAiiKorirz 

Evdluator 



1 



• - (i 



IV 



Abstract 



This report describes a model for machine transiatxo^i de- 
Veloped-at Berkeley during 1972-74. The model/is built around 
a; set of procedures called verbalization, intcsnded to simulate 
the processes employed by a speaker or writer ir^ turning stored 
knowledge into words. Verbalization is seen to consist of sub- 
conceptualization and lexicalizati.vn processes '-hich involve 
creative choices on the part of the verbalizer, together with 
algoritlimic syntactic processes determined by the language being 
used. Translation is viewed as (1) th^ reconstruction of the 
verbalization processes Vhich went iiyto the original 'source lan- 
guage text and (2)'~the application pt parallel verbalization 
processes in the target language. The target language. verbal- 
ization looks for creative choices to the soured language ver- 
balizatioh and tries to apply corresponding choices, at the- same 
time that it applies syntactic processes dictated by the. grammar 
of the target language. Verbalization and translation processes 
are illustratei in some detail, with examples taken from English 
and Japanese. Some of^^these processes have been iiVtplementwJ in 
an interactive program using the facilities of the Lawrence 
Berkeley Laboratory, but the main intent of the report is to dem- 
onstrate the kinds of processes that need to be incorporated in 
such a system. 



ERIC 




e is the notion of verbaliza 




Central to the view of translation that will be presented 



:ipi}ircation of processes by whi<:h s)me nolistic conceptual 
9hunk, recalled from memory, is converted into sentences dnd 



Verbalization is the 



wo^jds — into a phonetically or gr 



tic 



aphicaij.ly communicable linguisl- 



repreLentation. Such a notipn assuMes that the under lyi'id 



consent of what is being communicated is^hot, or need not be,j. 
in v>.erbal form to begin with. A: the veaiy least it may be a 
:;ompJex system of discrete elemeihts and relations, represent^ble 



perhaps as a network of nodes and^rcs. jlt may also involve an 

V 1 I 

imporcant nondiscrete. or anal<59^bmponent,l representable only in 
t^iex terms. Whatever may tyrn oijit tp bo the case hu4.n, it 



some o 



soems clear th?.t some sorts of processes must bo applied in 
order to transform the original form of st(prage Into a verbal 
output: that the stored material must be ^/erbalized. 

- .\ ■ 1 

In any particular motance of ti^analajiion there are two 
instances of verbalization. , One is the orliginal verbalization 
perfornied by the creator, of th: source vlanjguage text. The other 

is the verbalization produced in the target language by the 

. ■ ^1 

translator. Besides being in different languages, these two 

verbalizations are fundajacntally differenx;, in one other respect. 

The source language verbalization is, we might say, autonomous. 

It is freely produced by the speaker or writer in any way he 



decides is appropriate i:o the content and the occasion, ^ .-d- 
t'*""v vtded aifei-es CO the rule^r-of his culture and the language he 
is using. The target language verbalization, on the other 
hand, is parasitic on the. source language one. . Not only must 
the translator adhere to tht:^ rules of his own itanguage, he must 
also produce a verbalisation thc^t communicates, so far as pos- 
sible, the same" underlying content or knowledge that was comiriu- 
\ nicated by the source language verbalization. The verbaliza- 
\ tion in the target language is thus subject to this special \ 
kind oi constraint. Tt?> pr'v^*iucer is not free to "say- what he 
. wants," ^ut must in.oofar'as possible say the same thing as the 
producer of ' the source language text. We ;3ugge^t®dv in an ear- 
lier report that there are two. dimensions of hi^h qulality trans- 
lation, which we termed naturalness and fidelity . Naturalness 

is acyhieved when the target language verbalization adheres to • 

/ 

ail the constraints of that language; the output will then 
sound "natural". Fidelity is achieved to the extent- that the 
target ianguaqe verbalization communicates the same content as 
ti)e source language one. 

Verbalization in general, as we see it, consists of a 
mixture of two kinds of processes: those which ne.'cessitate 
creative decisions on the part of the verbalizer and those which 
do hot, being governed by the constraints imposed by the lan- 
guage. We might speak of creative processes and al gorithmic 
processes. Creacive processes are ultimately governed by the 
• content which underlies the verbalization; the verbalizer has 



3 



to decide how best to verbalize *th^V content . Normally a* range 
of choices will be open to him, and he must decide"»«»l:^at will 
mbst effectively convey what he has in mind. After he has made 
such choices, there are often automatic consequences wnich fpl- 

« 

low from, theiri becaUse of the particular rules of the language 

— . o i It 

(but which are themselves likely to lead to the nec.essity of 
further creative choices) . We can say, then, with respect to 
the two verbalizations involved in a translation, that the pro- 
ducer of . the sour.ce lar^guagt^ verbalization has' applied both 
creative and. algorit^hmic processes, whereas in the target la|i- 
guage -verbalization only algorithmic processes are autonomously 
applied, the necessary creative choices being deteiimined by tY\e 
choices that were made in the source language verbalization, 
TJius the naturalness' of the final translation dependls on ad- 
herence to the algorithmic processes of the target language', 
while its fidelity depends on the extent to which the trans- 
lation has been able to incorporate creative choiv^os that cor- 
respond ,to ^hoso originally applied in the source language. In 
all probability • there are ^ cases where exact correspondence in 
these choices is not possible, and where a certain amount of 
autonomous creativity hag to be introduced into the target 

t 

verbalization as well. The^e are the cases where automatic 
translation becomes most problematic On.e useful goal of ma- 
chine translation research ought to be to determine precisely 
the nature and extent of such cases. 

We are led, then, to the general picture of transli^tion 

^ ' • - 10 



which is shown in 'Figure 1." The two vertical columns represent 

/■*•■■ ■ ■ . . 

the two verbalizations which are involved: on the left -the 

* ■• ' , • ■ 

souirce language verbalization afid on the right the tarcet ver- 

balization. ,The input to translation procedure, of ^course, . 

is an alreac/ produced verbal output or text in the source Ian- 

guage. The ficst major component of ,the translation procedure 

^ 

will have to be the^ recoristru'ction^ from that text ot the ver- 
balization processes by which it was produced, a kind of "de- 
verbalization". _ We wi^l refer to thfs as the parsing component, 
although it is clearly different from convent ionkl parsirjg. It 



aims to reconstruct, not a single 
■ surface text, '-but ratlier a series 



dkep strudture underlying the 
of i ^processes by which that . 



text was created from the knowledge — hot only nonverbal but' 
possibly even nondiscrete — whichithe speaker or writer had in 
mind. The output of the parsing component is ideally a com- 
plete reconstruction of both the cr.eative and the algorithmic ^ 
processes which the source language verbalizer applied. 

The other major component of the translation procedure is 
the translation component. It is equivalent to a verbalization 
in the target language. The processes wh^ch make up this ver- 
balization are, to the extent that they are algorithmic, those 
which express target laAguago constraints and, to the extent 
that they are creative^ those which correspond to choices al- 
ready made in. the reconstructed soiarce language verbalization. 
The necessity of reference to the source language verbalization 
for creative choices at many points is suggested in Figure 1 by^ 



L.. 



..\ 

« 



Boxxtce 
language 



target 
language 



irfltial 
nonverbal 
representation 



verbal 
output 



-^v 

initial 
nonveroal 
representation 




verbal 
output 



( 



Figure 1 



ERIC 



• Aw 



"1 



the zigzaq arrows. f . / 

We believe that this picture provides a plausible basis 
for translation rcaoarch/ but nocdloss to say it "presents many 
problems whose .solutions arc only dimly foreseen at the present , 
time. Our pronect has so far concentrated nore of icst^atten-* ^ 

. ' ■ J . 

tion on verbalization itself than dti parsing Qt:.-trans^latio.n, - 
since both o^ lihe latter' depend on \a prior understanding of * 
verbalization., Any other* ordqi^ing of priorities would be pyt- 

ting th9 cart before thd ho^se." Any detailed investigatf^on of 

ft . • * ' ■ , *■ ■ ^ ' , 

the parsing cc^.ponent v/9ald;. be futile if 'we did not ksiovf -what . 

sort of output we would expect that component to* produce'': « the ° 

■ . .■ / . - • ■ / 

processes that weftt into a . particular verbalization. - The trans-'. 

* . ' '--^ • 
laijion component is a verbalization,, though ohe o^ a - -.apGcial 

sort, and there again a detailed understanding iQf verbalization 

processes is necessc^ryl This report, then, /will be*most con- 

• ^ 'a / . • 

cerned with the nature of vprbalizatior.* Wo will also devote 
consideralblc space to the nature of that special sort of vcr-_ 
balization which is translation. Wo will have the least to say- 

about parsing. \ * 

\ 

t \ 

For about the last nine, months of the project we have been 
concerned with tite development of an interactive computer pro- 
gram that wili implement: the *ve cbalizaticn processes we hypoth^ 
esize. Although this program is primitive, th^ intention 

is that it will gradually achieve increased sophistication -in 
it3 ability to simulate verbalization, translation, and parsing. 



13* 



'As it presontiy simulates the processes of verbalization, it 
-oegins. with an' itent that repj^^ents the initial hol.istic idea .. 
which the specxker or writer of a text wishes to communicate. " 

• . * ■ , * • 

It then ars the user, seated at a teletype, to make the series 

of creative choices that are necessary in the production of 'the 

^ f ipai. text. At the same time it attempts to apply on its own 

' '• • . ^* . 

the algorithmic processes which are called for. It knows when ; 

creative choices are necessary, but not what choices to make. 
The user must decide. But it should be able to apply the al- 
gorithmic processes without help. As it simulates translation " 
it wiA^l* be ..able to apply the algoriuhmic procecses of the target 
language ♦automatically , and al^o to apply certain creative 
processes on its 'own. by looking at the source language verbal- 
i.zation to see what creative choices were' made there. Whenever 
it is not able to make a creative 'choice, the program asks the 
user to do so. .'We find th^t this kind' of machine-u;3^r\ inter- 
action provides a valuable 'research technique. Taking as our 
ultimate goal the even.ti;al elimination of the user from the 
translation program altogether, we are starting with a situation 
in which the user intervenes at many points. As we learft more 
we will gradually give thd machine more to do and the user less. 
•This technique can be followed not only in verbalization and 
translation, but also in parsing. Whether the user will even- 
tually disappear from the picture ^together is impossible to 
predict at this point. 

V . ' ' 

However that may be, the goal of a program in which the 



11 



\ . ■ ' ' . 

contribution dt"'"*fcteLUser . is significantly diminished in rela- 

' \ " ■' ^ ' ■ . 

tion to fhat of the machine seems workable.. Shof^^ of the final, 
goal of eliminating the user altogether, an inteirmediatG goal 
identifiable as "human-aided" machine translation can 'more eas- 
ily be foreseen. ■- Here the machine will do the 'many things for 
which it is suited, but a human brain will be introduced at 
those pointy where the machine has reached its limits.^ This 
intermediate' goal , has, we believe, significant practical as, well 



as^ theoretical value. 



/ 



/• 



•I I . Subconceptualization /-^ 



• We assume that a speaker or write?: begins with a single r 
unitary, holistic conceptual chunk that he has recalled from '. 

-memory and has decidedj, for some reason, to communicate. Thus 

he may have in. mind some incident in which he was involved,. . , 

■. '% ■■ ^ ... ■ ^ - . 

something of interest he was previously told about or read 

• . - \. • 

about/ some experiment he wishes to report on, or whatever. We 

■ ^ ■ "' , ' 

• label such "a chunk, as -well as the smaller chunks into .which it 

will be analyzed, with the prefix CC (for "conceptual chunk") . 

fi^ f followed by'*3^four-digit number. The first digit indicates the 

language in which verbalisation is to take place ("l"^for 

English and "2" for Japanese), and the remaining three digits 

constitute an arbitrary index for the particular chunk. Thus 

CC-1001 might ho^ the name given to some particular chunk of 

this sort that is about to be verbalized in English. 

We assume, furthermore, that' while this chunk is from one., 
point of view a unit-, from another point of view it has a more 
or less rich content, and that it is this content which the . 
speaker wishes to convey to his audience. Sometimes, though 
not in most cases, the initial chunk itself may^l^ve a linguis- 
. tic label. If it is a folktale, for example, if may have a 
name like "Cinderella" or "The Three Bears". But someone who 

^ has decided to tell a story is not likely to say just ."Cinder- 

. .ella" and let it go at that. (One is reminded of the old story 

10 . • • 

erIc ■ • 



about a convention o^ comedians at which papple said things 
like "49" or "^178." and elicited laughter each time because 
everyone Jsnew the jdkes these numbers stood for.) Normally it 
is necessary .instead for the speaker to get inside tho content* 
of- this initial unit — to analyze "it into smaLller chunks. This * 
kind' of process can be /p:^ctured ^s shown in Figure 2« where the 
initial chunk CC-1001 has^ been, as we say, subconceptualized' 
into chunks CC- 100 2 and CC-],003. In a text of any size each of 
these smaller chunks will be further broken down into still 
smaller ones, and so on, sg that a hierarchical structure- of 
successively smaller subconceptualizations emerges.' 

Subconceptualization belongs to the class of verbalization 

processes' which are- creative. Normally a chunk does not auto- 

■ " . ■ ' • .• 

matically determine a particular subconceptual breakdown, but 
the 'Speaker must creatively choose how to subconceptu§ilize each 
one. It^is useful to think of the content of each chunk — each 
circle in Figure 2 — as if it were a mountainous landscape, with 
the most salient aspects standing out in bold relief and the 

f 

1^33 adiierit appearing as .only minor hills. All other things 
being equal, the more salient some aspect of the total content 
is, the more likely the speakiso: is to ^express it when he ^ub- 

conceptualizes. He is not likely' to make exactly the same sub- 

■ - / 

conceptual breakdown aach time he communicates the sameMnitial 

chunk, partly because he may, judge different things to be sali- 

/ 

ent in different contexts and partly because the landsqkpe it- 
self may change over time, the relative salience of its dif- 

^ 11 

17 




CC-lOOl 



4 

Figure 2 




t'cront aspects boing modified in long-tetm meinory. We assume 
that any particular subcpnceptualization necessarily 'leaves out 
part of the content of what is being .subconce"ptualized, as sug^- 
gested by the area that lies within the larger circl.c but out- 
side the two smaller circlVs in Figures 2. Subconceptualization, 
that is, is necessarily a selebfcjfc^e process. No one ever says 
everything he^^^uld say about what he has in mind. 
\ . 

S\bconceptualization of a particular chunk, say CC-1001, 
produces two' or more new chunks, say CG-1-C02 a^d CC-1003. These 
new cliunks, furthermore, are conceived of as related to each 
other iX^some way.. For example, CC-1002 might be the "reason"' 
for CC- 100 3. Suppose the entire text consisted of the sen- 
tences, "I bought a bike yesterday. I decided I need more ex- 
ercise." Let us say that the first sentence is 'a verbalization 

• •» * 

4 

of CC-1003 and the second sentence of CC-1002. We can say that 
CC-1002 is the reason for CC-1003. . We write a subconceptuali- 
zation process of this kind the following way: * 

1) eC-lOOX S> CJ-REASON (CC-1002, CC-1003) 

• ' — 

This statement says that the initial chunk, CC-1001, is sub- 
conce^ualized (S>) into the chunks CC-1002 and CC-1003, and . 
that these two new chunks are related by the predicate labeled 
CJ-REASON. The prefix CJ stands for "conjunction" (derived from 
the grammatical, not the logical use of this term). Any rela- 
tion between CC's is labeled with this prefix. 



We .use a' different natation to represent each of the var- 
ious stages in the verbalization process. At the outset, in 
this example, the initial chunk CC-lOOl was all 'that was pre- 
sent. This initial representation, before any verbalization 
processes had been applied, was simply: 

2) " CC-1001 \ 

After the subconcoptu^lization specified inl) was appliedy the 
representation became: 

3) CJ-RHASON •. - • . . ^ N 

CC-1002 

CC-1003 •. . ■ , 

Subconceptualizat ion processes are thus rewrite rules, whicla • 
replace one stage in a verbalization with ^ subsequent stage. 
The format wc use to represent such stage^,^, as in 3) , shows 
predicates -with ^their. argiunents written indented below them. - , 

In simulating verbalization our program presently asks iihe 
user to specify all the creative choices^ restricting its own 
contribution to the application of algorithmic procelsses 4c-- 
termined by the grammar of discourse, sen^tences, and v^ords in 
the language involved. Tlie program' is labeled VAT (for "vcr- 
balizer and translator") , and we can illustrate conversations 
between VAT and the user identifying them as V and U respec- 
tively. The pfogram begins by asking: 



4) V: WHAT VAT TASK DO YOU WANT PERFORMED? 




14 



to Which one possible answer Is: 

5) .;: VERBALIZE CC-1001 .\ 

^kipping several steps tc illustrete only the rough outlihes of 
subconcieptaaUzation, we are interested just ^ow in the qui^^tioni 

6) V; H0W IS CC-1001 SUBCONCEPTUALZZEDV ' ^ 

\ 

to which a possible answer is*: 

7) U: REASON j[CC-1002, CC-1003i 

At tu^is point VAl will construct the re^)resentation shown in 3) • 

VAT will now apply an algorithmic or/ as we say, syntactic 
process triggered by the presence of CJ-REASON in 3) . The 
proc€t8s applied is of a type that is not yet clearly under f^tood, 
but we may view what we do at present at a first approximation. 
At the moment VAT simplv takes thB two CCs related by CJ-REASON 
and orders them so th&t tne second vdll bo expressed before tha 
first. That is, for examplt*, if CC-1002 is eventually going to 
be verbalized a^ "I decided I need more exercise" and CC-1003 
as "I bought a blKe yesterday", we want the two sentences to be 
expressed with CC-1003 preceding CC-1002. Thus VAT will auto- 
matically change the representation in 3) to the follbwing: 

8) CC-1003 
CC-1002 

This kind, of representation, in which no predicate is shown 



15 



above tho two CCs-. indicates that they (or their' eventual v^ar* 
baiizations) are to oc. *ir in t'-,e final text in the order shown, 
with CC~i003 preceding CC-lpfl2. • 

la Japanese the corrtsponding syntactic process vill typ- 
ically lead, to the ^attachment of CJ-"K.?UIA" at tne end of the 
second sentence. Thus if a itepresontation iiko hhat in 3) were 
produced in a Japanese verbalization VAT would automatically 
change it to: * » • 

9) CC-100*i 

. .cc-iao2 . , ' 

CJ-"KAPA"- 

Tho quotation marks around "KARA" indicate that this is an ,itom 
wiAich will actually appear as a word in the tejct. Quotation 
marks are used for itews that have a surface lexical represen- 
tation. The representation in 5) is deficient in that it fails 
to show that CJ-"KARA" will be part of t'he same sentence as 
CC-1002 , whereas CC-1003 will (or ' is ^.ikely ^c) form a differ- 
ent sentence. We indicate sentence bpundaries with the notation 
CJ-".", since the period will appear in the final text. Thus 
fuller versions of 8) and 9) arc, respectively: 



10) CC-1003 
CJ-"," 
CC-1002 
CJ-"." 



11) CC-1003 
CJ-" ." 
CC-1002 

CJ-"KARA" V 

16 



The creation of these periods ir a housekeeping tas)? tliat need 
not be described in detail here. 

G5von a representation like that in 10) , VAr will go on to 
ask abort the subconceptualization of the; first, CC in the order- 
, in\g. The general principle followed her^ is one of "depth 

. I 

first", in the sense that earlier items :.n the text are > com" 
i>iotely verbalized before the vejrbalization of later items is 
bequQ. Thi:^ procedure probably has some! p£;ychological validity 
that is/ a speaker is likely to think o'^ Later parts of wliat he 
is going to say only in terms of the most general chunks, while 
he is elaborating tha earlier parts in detailA^ Only after he 
has finished t.ie verbalisation of these earlier parts will he 
turn his attention to a full verbalization of the later ones. 

Thus, omittins; various considerations not as yet discussed 
subconceptualiza\ ion proceeds interactively in the following 
fashion; 

I 

12) V; WIIAT VAT TASK DO YOU WANT PERFORfdED? 
U; VERBALIZE CC-iOOl 

(VAT creates thci following representation:) 
CC-1001 

V: HOW IS CC-1001 SUBCONCEPTUALIZED? 
U: REASON (CC-1002, CC-1003) 

(VAT creacGs first the following representation:) 

17 

2:i 



[ 

4 ' • i~ 

CJ- REASON 
CC-1002 
CC-iO03 

' (and immediately applies a stored syntactic algorithm that; 
changes it to:) 

CC-1003 * ' 

cc-io1d2 ' - • . ' 

V: HOW I& CC-1003 SUBCONCFPTUALIZED? ■ ' 

'2tC. 

In this fashion a subconceptual hieraircny of any degree of con-" 
plexity can be constructed and expressed. 

The organization of a text may. not be entirely hierarchi- " • 

cal, however. Not only does a speaker break down larger chunks 
into smaller chunks — larger "concepts" into subconcepts; one 
chunk may also remind him of another, so that the organization 
which results may be in part concatenative. Wc have been view- 
ing coi^catenation in terms of excursions away from the main 
hierarchy, ami have boon calling such excursions digressions . 
In some discourno, how«wor, thoro is no necessary constraint 
that the main hierarchy bo returned to, and the result may be a 
rambling text in which digression is added to digression. In a 
moro tightly organized text digressions are more likely to ^ap- 
pear as parenthetical remarks: brief sidepaths which guickly 
return to the main hierarchy. Wo use the term parenthesis for 
this brief and transient kind of digression. 

If subconceptualization can be rcprGscnted in terms of a 



\ 



18 



tree diacj ram (which does not, however, provide a convenient 
means of showinq the relations b.ptsy«^en subconcepts, like CJ- 
REASON) , then diqressions can bo pictured as subtre(?s. .-i.ttachod 
to th»^! main tree at one point or another, aa suqqestod in Fiq- 
ure 3. ^ ^ 

« 

i 

One other^ important modification of the strictly hierar- 

r 

.chical model, of subconceptualization results' from the common 

occurrence of s ummariza tion. It is frequently the case in ver- 

. . - ■ ■- 

halization that an initial chunk will be subject to two sepa- 
r^te hierarchies of subconceptualization, one of which can be 
idoncified as a summary of the other. It is characteristib of 
a sumj\ary that its .subconcoptual izat ion proco.ssor. never proceed 
beyond some relatively large chunks — chunks which package a rel- 
atively largo content. We can contrast a subconceptualization 
hierarchy which is a summary witH one wljiich constitutes the 
body of the text and consists of subconpoptual ization processes 
which produce a larger number of chants ot smaller size. 

A summary is typically expressed at the beginning or end 
of a text; th'.t is, preceding or following the body. Various 
conventions for summaries are associated with different genres . 
of writing . For example, a scientific article may 
begin with the self-conscious kind of summary that is called an 
abstract; a news report typically contains an opening paragraph 



tolliAcj who, what, whe.jfe* and when; a fable' is likely to end 
with a mordl/ and sci on. Our .program at present simply asks, ' 
for the initial CC, whether it has an initial summary {qne ex- 
pressed at the beginninq of the text) . If the answer is yes it 
asks first for a subconceptualization of the summary, and moves 
on to ask about the body of the text only after the summary has 
been c6tnp44ito.iy verbalized. At the end of the text it asks 

whether there is a final summary. . ' . . ^ 

/ ■ " • 

Creativity within a discourse is likely to b$^' limited by 
the qcnro to which the discourse belongs. It would .appe;ar that 
th^re is a continuum ranging frdir. maximally stereotyped to max- 
imally creative discourse. Most stereotyped are those forms of 
discourse, such as rituaj.s, in which the speaker has very little 
choice as to What he is going to say or how he is going to say 
it. With such 'discourse the "grammar'* of the genre provides 
many of the answers to the quostions VAT would otherwise have^ to 
ask the usor. In other wprds; VAT sliould be able to produce 
ritual texts with a minimum amount of recourse to creative de- 
cisions. -At the other extreme are forms of discourse such as 
descriptions of unique personal experieinces which have never been 
described before, where the speaker is relatively free to make a 
great variety of creative decisions. 

We believe it would be of considerable interest to incorpo- 
rate into the verbalization process the constraints imposed by 
several different genres, but we have not as yet done this. As 

21 

4 4 



it nov*. stands .our program 'does ask l^HAT IS HUE GENRE? as soon 
it has established' that a verbalization is to be performed. 
Possible answers that we . hope.. ...to ..implement in the future are, 
for .example, NEWS ' REPORT , PSyCHOLOGY ARTICLE, FABLE, and the 
like. » 



J 



22 



2S 




An example of these •procedures, as applied to a real text 

can be based on the following United Press report taken, slight- 

"ly ' cbndenseTr"f rom^the San Francrsco" Chr on i c le"of "May 16 7"! 9 7 4 : 

< .( ' • 

13) 1. An 11-year-old boy using a new* "super-glue" 

2. accidentally glued hi^ eye shut 

• 3. while building a model airplane, , • 

4. and "a doctor had to reopen the eye surgically. 

5,,. Mike Harris said 

6. he rubbed his left eye 

7. after several drjfpps of the glue squirted into it 

las't Sunday. 

8. and found his eyelid would not move^ ■ 

9. An eye surgeon debated briefly about 

10. usinc a super glue solvent 

11. but decide-'; against it ' ' 

12. for fear it might damage the boy's eye. 

13. The surgeon, who asked not to be identified, 

14. finally put Mike in the operating room, 

15. trimmed Mike's eyelashes, 

16. then opened the eyelid surgically. 

17. Mike was released from the hospital Tuesday. 

It is approximately the case that each of the numbered lines in 

23 

♦ 

- 25) 



this text exprossps a terminal subconcept. We assume that the. 
text contains a number of intermediate . subconcepts as well, 
which nee4 to be el uc ida-ted 'in- a ubconc^ t ual hi e r a r €hy-> ' — 



.Let us suppose that the combination of VAT and the user are 

■ «» . ■ •■ ^ 

attempting to Simula te th e verba l ization proce sses that went , 

into the production of this text../ For the moment we are con- 
cerned o^ily with subconceptualization processes (and associated 
syntactic algorithms). Many of the user's answers in the fol- 
lowing conversation with VAT. are intuitively based. The success 
of our eventual parsing component will depend on the extent to 
v/hich these intuitive answers can be pr.edicted from' the text to- 
gether with whatever items of background 'knowledge are relevant. 
The example will be carried only f^r enough to suggest the na- ' 
.tare of the procedute. . . •; 

The exchange begins in the usual way: 

14) V: WHAT VAT TASK DO YOU WANT PERFORMED? 
U: VERBALIZE CC-1001 

VAT creates the following representation^ including a tcixt-final 
period: 

15) CC-1001 

VAT'S next question seeks to establish what genre constraints 
"apply in this text"; 



16) V: WHAT IS THE GENRE? 
U: • NEWS REPORT 



"VAT will now assume~tnat the text "is a ~^"ypic:aT~news report Which:- 
begins with a summary. Its first questions will deal with the 
subconceptualization of the summary (expressed m ,the text in 



sentences 1-4) : 

17) V: now IS CC-1001 SUBCONCEPTUALIZED IN THE SUMMARY? ■ 
U: YIELDS (CC-1002, CC-1003) 

The user has answered that the first breakdown of the summary is 
into two subconcepts, ^CC-1002 (to be expressed jas lines 1-3) and 
CC-1003 (line 4) . Furthermore, the relation between these two 
qcs has been identified as one labeled YIELDS, in which the first 
CC "leads tp" or "results in" the second. YIELDS differs from 
another, similar relation which is labeled CAUSE'' in that the 
event conceptualized by the second CC is not a necessary conse- 
quence of the first* It is, however, something that presumably 
would not have happei^d if the event cor "^eptualiz^^y the first 
CC had not taken place. As a result of th-^ user's answer in 17) 
VAT first creates the representation: 

r 18) CJ-YIELDS ' . 

\>- CC-1002 
CC-1003 
CJ-"." 

and immediately applies a syntactic process which changes it to: 
19) CC-1002 ^ * 

>v / 

25 




CJ-", AND" 

CC-1003 

CJ-".'V 



?h.at--i5^__t he two CCs are to be expressed with the "yiexder" pre- 

ceding the "yielded", and they are to be connected- with_ .a comma 

r * 
followed by the word "AND"'. This is not the only way in which 



YIELDS c~an"~ije-reaii*4sdy_J3iit_Jor the sake of the example we may 
regard it as such. VAT will now proceed to ask about the sub — 
conceptualization of the earliest CC in 19) : 



20) V: HOW IS CC-1002 SUBCONCEPTUALI?ED IN THE SUMMARY? 
U: FRAMES (CC-1004, CC-1005) 



The user has answered that CC-1002 is broken down into two CCs, 
CC-1004 (line 3) and pc-1005 (lines 1-2). They are related by . 
FRAMES, a temporal relation in which the first CC occupies a 
time period larger than and including the time period of the 
second. In other words the time period of, sentence 3 includes 
that ot sentences 1-2. VAT creates, sequentially, the following 
two rcpru&entatioas : 



- 21) CJ-FRAMES 
CC~1004 
• CC-1005 
CJ-", AIx'D" 
CC-1003 
CJ-"." 



22) CJ-"WIIILE" 
, CC-1004 
CC-1005 
CJ-", AND" 
CC-1003 
CJ-^'." 



26 



/ 



Although there may be Several i ossibilities for the expres- 
sion of. FRAMES/ VAT has assumed here that two factors are in- 
volved: an ordering of the two CCs so that the "framer" pre- 
cedes the "framed" , and a prefixing of. the word "WHILE" to the 
first CC. In this example the ordering of these two CCs will be 
reversed in a subsequent operation. 

Wo would expoct VAT to ask next about the subconceptuali- 
zatTon 6r~CC-1004v-but i»y_^_ineans not- yet discussed VAT will 
discover that this is a terminal CC (one not further subconcep- 
tualized) . If CC-1004 were followed by ".". or by ", AND", VAT 
would proceed to ask questions directed at the complete verbal- 
ization of this CC. But since CC-1004 is not followed by one of 
these boundaries, attent4.on is next focused on CC-1005: 

23) V: HOW IS CC-1005 SUBCONCEPTUALIZED IN THE SUMMARY? 
U: FRAMES (CC-1006, CC-lOO?) 

# 

VAT creates the following representation: , . 

24) CJ-"WHILE" ' ' . 
CC-1004 

CJ-rRAMES 

CC-1006 

CC-1007 
CJ~", AND" 
CC-1003 
CJ-"." 

'The user has said that CC-1006 (line 1) occupies a time period 
whicfh includes CC-1007 (line 2). So far we would expect this 

t 

second instance of FRAMES to be expressed by prefixing the word 

27 



"WIULE" to CC-1006, as was done in 22). Let us suppose, how- 
ever, that FRAMES actually triggers a more complex algorithm 

icli"^ay.s in effect that one "WHILE" in a sentence is enough, 
and that a second instance of FRAMES will lead to a diiferent 
expression. Here the second instance leads to the creation of a 
relative clause which will modify one of the constituents o^ 
CC-1007. Furthermore, the already created "WHILE" claU^e,Jwili 
be moved to a position after CC-1007. (This ordering of the CCs 
oes- appear _to. h& raaximally_jLat.uraJ^^. It would, tee sl ightly less 



desirable, for example, to produce "While he was building a^ model 
airplane an il-year-old boy, using a new "super-glue", acciden- 
tally glued his eye shut." Certainly, however; the differences 
in this area are very subtle.) . We will indicate the relative 
clause status of CC-1006, to be embedded within the expression of 
CC-1007, with a slash notation: 

25) CC-1007 / CC-1006 
CJ-" WHILE" 
Ct-I004 

CJ-" , AND" ' • 

CC-1003 

CJ-"." . ■ , 

The representation in 25) will be discovered to be the final 
one in tho su^conceptuc^lization of the summary, •.'hich has been 
found to consist of four CCs (ultimately four clauses) joined 
.together in the manner indicated. VAT will now proceed to vei-\ 
balize the summary completely, making use of other kinds of proc- 
esses. When that has been done, it will say: 



28 

34 



26) Vi WE NOW MOVE TO THE BODY OF {HE TEXT. HOW IS CC-1001 
SUBCONCEPTUALIZED? 

^•^ 

U: YIELDS (CC-1002, CC-lOOa). 

♦ 

This is. of course, the same answer that was given to the corre- 

r ■ 

sponding question in 17). above. As CC-1002 and CC-1003 are fur- 
ther elaborated, h'owdver, many differences will emerge. Ulti- 
mately CC-1002, which was expressed in sentences 1-3 of the sum- 
mary, will be Expressed in the body of the text in sentences 5-8. 

.CC-1003 , pXprpRSpd -i.n-.the-si.immary as sentence 4,- will-lae ex.-?- 

pressed in the body in sentences 9-17. 

We will not repeat here the operations involved in the sub- 
conceptualization of the body of the text. They are for the mo^t 
part similar to those illustrated above. Various other relations 
between CCs are introduced: for example, that between CC-1015 
(lines 9-12) and CC-1016 (lines 13-16). The first of these CCs 
involves an alternative, that is rejected in favor of the alter- 
native conceptualized in the second; thus, the relation may be 
labeled REJECTED-IN-FAVOR-OF. Within CC-1015 there is a rela- 
tion of CONCESSION (denial of expectation) between CC-1017 
(lines 9-.,0) and CC-1018 (lines 11-12). It will be of consid- 
erable ir.cerest to isolate relatiuns of this sort in a variety 
of texts, and to determine the ways in which they may be ex- 
pressed under varying circumstances in different languages. 

The text does contain one example of a parenthenis , ex- 
pressed in the nonrestrictive relative clause in line 13. The 



, fact that the suryeon asked not to bo identified is a minor 
digression from the mainstream o^ the account. It is attached 

the node representing th<^ • «5urgeon -vhich will become a conr* 
sti^uent of CC-1032 (lines 14-16). , >• 




IV. ^*jx ica lization of a CC 



We use the term l<3xicalization to refer to another major 



component of verbalization: specifically to a cluster of - proc- 
esses that are involved in the choice of a particular linguistic 



expression for a CC. Subconccptualization breaks c^Kifn an ini- 
tial, holistic chunk into smaller chunks. These smaller chunks, 
-however^— remain-conceptual in nature, and~ottssr operations arc 
necessary to convert them into surface linguistic representa- 
tiuna. Roughly speaking, lexicalizaticn involves the choice of 
"words" that will appropriately communicate the content of CCs. 

Lexicalizaticn of a CC takes place at the point where the 
speaker decides that he has subconcoptualized far enough. The 
aim of subconceptuali^tion is to produce chunks of a size ap- 
propriate to linguistic expression, and particularly to lin- 
guistic expression that will convey neither too little nor too 
much information to the addressee. Too little information is, 
for exiimple, provided by a summary, where subcpnceptualization 
has proceeded only to a point where lexicalizaticn will provide 
the addrocsee with a "general idea" of the content of the whole. 
At tho other end of the scale, we are all familiar with exposi- 
tions in which too much information is conveyed, where we arc 
told more than we want tc Know. One aspect of a speaker's cre- 
ativity, then, is to de,cia<i exactly where in the procass of sub- 
conceptualization he should stop, taking into account the needs 




31 



and interests of the addressee. It Is^at this point that he 
turns to lexicalization* \ 

( ■ \ 

- The speaker may also |be influenced in such decisions by 
the resources his language makes available for packaging chunks 
of different sizes. Consider, for example, the amount of con- . 
tent that is packaged in an 1^1 ish sentence like "He hit into 
a double play." Xf our language did not provide this partic- 

» 

ular expression, we would have to subconcept:ualize this chunk 

considerably further and come up with chunksl that would have 

i 

to be expressed in somo such way as "He hit the ball to the 
shortstop, who threw it to the second baseman before the runner 
previously on first base could reach second. The second base- 
man then threw the ball to the iirst baseman before the batter 
could reach first. Thus his hit caused two outs to be made." 
Presumably a language makes available packaging at various lev- 
els of s\^bconceptualization according to predominant communi' 
cative nc^eds within the culture of its speakers. 

How are conceptual chunks communicated? one way to ap' 
proach this question is by looking at the spatial and temporal 
properties of such chunks. A chunk is typically either an event 
("He rubbed his left eye") or a situation ("The glue was next 
to the iamp"). Both events and situations have a particular 
locus in space and time (the difference being that an event 
involves some spatial change through time, whereas a situation 
does not). Such chunks, then, can be regarded as as- 



32 



ERIC 



signablc to particular coordinates .in both a spatial and a 
toropoVal continuum.. (We oir.it consideration here of generi.c . 
chunks, expressed in sentences like "Dogs chase cats" or "The 
house had two chimneys"., where at least temporal particularity 

is absent. Genericness calls for extended discussion that would 

t '■ 

t^ke us too far afield at this point.) 

/ 

If wc assume that most of the chunks a speaker want's to 
find linguistic expression for are events or situations, and thus 
have both spatial and temporal particularity, it is not sur- 
prising that language fails to provide direct labels for them. 
Wc cannot, in the course of subconceptualization , arrive at some- 
thing, like CC-1011, then remember that the name for this chunk is 
"BLURG", and communicate it by uttering that word. P'articular 
events and situations are too numerous ^ and our experience of 
them too idiosyncratic for them to have their, own particular 
names. The way this problem is solved is through the interpre- 
tation of many different CCs as instances of the same category . 
Thus the time last December when I gave my mother a Christmas 
present, the time when the mailman gave me a regiiitered letter 
this morning, the time yesterday when the teacher gave my son a 
note to take home, etc. etc. are all categor izable as iustPnces 
of "giving". Wo label the category itself UC-"GIVE" (UC stand- 
ing for "universal category") and specify the choice of this 
category by the speaker with the notation: 

27) CC-i053 C* Ud-"GIVE" 

33 

as 



Such a statomont is to be read "CC-1053 is categorized as an, 
instance of the category UC-"GIVE"". It should be noted that ♦ 

< 

the English word "GIVE" is not the name of this category; rather, 
any particular GC which is so categorized can be commuaicated 
with the word "GIVE". In other words, the decision described in 

27) allows us to use "GIVE" a.s a name for CC-1053. 

• .» 

The way in which a speaker decides that -a particular CC can 
l)e categorized as an instance .of some UC is of course a funda- 
mental psychological question. One thing that seems clear is 
that some CCs are more easily categorized than others; ease of 
categorizability has been called "codabdlity" In a closer ap- .. 
•proximation to human mental processes, therefore, a statement 
'like 27) ought to be qualified as valid to a certain degree, and 
not as an all-or-nothing decision. If the degree to which a 
particular CC is an instance of some UC is vory high— if the CC 
is highly codable — then the use of the word provided by the UC 
will succe'^'^ quiLe well in conveying the content which the s^eak 
ur has in mind. If, on the other hand, the content of the CC is 
not very well captured by assigning it tQ the UC, then the speak 
er is likely to want to add one or more modifiers to mold the 
content more closely to the content of the CC he has in mind. 
Adverbs are an obvious ^ievice by v/hich such molding is accom- 
plished. Thus, the speaker might decide that the content of CC- 
1053 is better captured in an intersection of UC-"GIVE" and UC- 
"GRUDGING" : 



28) CC-1053 C> UC-"G:(:VE" v& UC-"GRyDGING" 

♦ * * 

in which case the eventual lexicalization will be "give grudg- 
ingly", and not simply "give". • • • . *' 

'suppose CC-1053 is. a conceptual chunk that will eventually 

,be verbalized with the sentence: 

"I 

\ \ 

28) Mrs. Brown gave Tommy a cookie. 

■ ■ . • ' I 

■ i 

We have said that the word "GIVE" la available as a label for ' . 

■ . ., ■ " 'I 

this CC. Up to a- point that is correct; there was a giving ! 
which "took place. But sentence 28) contains more than the word 
"GIVE". What kind of conceptual information is conveyed by "MRS.\ 
BROWN", "TOMMY", and, "A COOKIE"? Each of these items evidently 
communicates a concept that is different in nature from a CC. 
This other kind of concept we label a PI (for "particular indi- 
vidual").. The chief difference between a FI and a CC seems to 
have to do with temporal particularity. A CC is conceived of as 
occupying a specific and .usually fairly limited period of time. 
The time period occupied by, say, Mrs. Brown is much less spe- 
cific, and is not likely to be something we are very interested 
in when we utter a sentence like 28). In other words, while a 
PI may have temporal particularity in the sense of a lifespan_or 
total time of existence, such a time period tends to be of a 
different order of magnitude from that occupied by a CC, and more 
often than not . is of little relevance when the PI is communi- 
cated. Furthermore, any 'one PI may participate in an indeter- 

35 



minate number of different CCs. '(Mrs. Brown has done many other 
things besides that which was reported in 28).) 

Why do Pis play a necessary role in the communica cion of a 
CC? The answer" may have something 'to do with the necessity for 
providing anchor points in the addressee's mind. Because of its 
lack of temporal particularity, the concept of a PI is a rela- 
tively stiable concept, and one which is liable to enter con- 
sciousncss again and Acfain with respect to a wide variety ol 
CCs. Thus, the only way a speaker can effectively install the 
content "of a CC in the addressee's mind is to tie it to one or 
more Pis already known to the addressee. That is, the usual way 
of commuiiicating information is by bringing one or more PI nodes 
into the addressee's consciousness, and by predicating Something 
of these nodes. Language usually involves taking one PI (the 
"topic") as. a starting point and either predicating something of 
it alone, or tying it to other Pis through a relational prodi- 
cat«.;. 

In deciding to categorize- a CC in a certain way, say as an 
instance of UC-"GiyE"T a speaker simultaneously establishes a 
framework of Pis which are separated out from the content of the 
CC, and which wi] have to be linguistically represented in some 
way. In the .case of UC-"GIVE" these P'ls will function as agent, 
penef iciary , and patient (the giver, the givee, and the given). 
The fact that these three Pis are entailed by the choice of UC- 
"GIVE" is expressed as follows: 



29) GC~A C> UC-"GIVE" • 

CC-A F> VB-"GIVE" .(PI-efAGT, P,I-ctBEN, PX-ofpAT) - 

** # 

The letters A, B, C, and D in this statement are variables rang- 
inq over particulat four digit numbers* For example, CC-A might 
be CC-1053, PI-B might be PIrl687, etc. .The symbol E> is to be 
roaa y"entail>" , and F> is to bo read "i^ friimed as". \(The nota- . 
'tion/to the right of F> can be regarded as a "case f rairl^e" ; hence 
the /appropriateness of the term "framing". On^ might aiso imag- 
. ine thj^t this kind of operation involves "framing" an utterance 
in the sense of deciding on its basic linguistic frsunework.) * • 

The statement in 29) , then, says that when one has chosen 
to categorize a particular CC as an instance of UC'"GIVE", this 
decision entails that the CC will be framed as, or expressed by, 

: \ 

.the vorb (VD) "GIVK" accompanied by throe Pis, functioning as 

• • • . * 

aqont, bonoficiary, and patient. Stiatemontn like that in 2)) 
are stored in our. English lexicon . This statement actually forms 
only part of the lexical entry for UC-"GIVE". The complete ejitry 
for this category contains a number of additional line£? which 
state various other entailments, for example that giving involves 
transfer of ownership. These other aspects of lexical entrl j 
will be discussed below. 

To suimnarize, a CC of the appropriate size, arrived at 
throu'.jh ::;ubconcoptualization , will be subject to categorization 
in termii oi some UC, the effect of which will be to create, by 
way of the lexicon, a verbal label for the^CC together with a 

37 



ft/-iaework of associated nouns. The framing operation, in ef- 
fect, will have factored out those elements (Pis) having no • 
significant temporal particularity, leaving a word (the VD) to 
which alono that temporal particularity will be assigned. 

It is probably a consequence of its beinf^left with this 
temporal role that the VB is likely to end.'up carrying a tempo- 
ral marker of some kind, such as a tense and/or aspect suffix. ' 
If, for example, the CC occupies a temporal locus that precedes 
the locus of the speeeh act, the VB is likely to end up with a 
past tense suffix attached. This part of lexicalization wc -call 
inflect ion. Lts implementation will be illustrated immediately 

below. 

• « 

Our program tries to establish at the outset for each CC 
whether it can bo categorized, on the assumption that the speak- 
er is aimivng at such categorization as a goal, and that subcon- 
ccptualization takes place only when the content of the CC is . 
such that categorization is not appropriate. Thus the first 
question asked of any CC is of the sort: 

30) *V: CAN CC-1053 BE CATEGORIZED? 

If the user's answer is no, VAT goes on to ask how this CC is to 
.ue subconceptualized, as in the example given in section III. 
If, on the other hand, the user's answer is yes, VAT will qo on 
to ask questions relevant to the tense/aspect properties of the 
CC. At present it asks first: 

38 4 1 



31) V: ir. CC-1053 GKNERIC7 

since special considerations have to be qiven to CCs .that do not 
have temporal particularity. If the answer to 31) Is r.o, V^^T 
presently assumes as a default option that CC-10>3 has a temporal 
locus preceding that of the speech act. This is certainly the 
most probable state of affairs for most );inds of discourse. Wc 
expect later to elaborate opUcr possibilities, which are likely 
to depend on adverbial and other means of establishing temporal 
par'ticularity . Our prog'ram at present will, under these circujn- 
stances, add the inflectional notation "PAST" aftej: a slash, as 
in: 

32) ' CC-10i3 y "PAST" 

/ 

It is now time for the following exchange: 
3 3) ■ V: HOW IS CC-1053 CATEGORIZED? 

/ 

• U: GIVE 

The user sdys that the docision has been to categorize this CC 
as an instance of the category UC-"GIVE" . VAT then looks into 
the lexicon and, on the basis of ^ho last line in 29) , replaces 
3 2) with': ^ 



34) VD-"GIVi;" / "PAST'' 
PI-BtAGT 
PI-CfBEN 
PI-DtPAT 



Two other considerations are relevant at this point. For 



V 



39 



4r> 



ERIC 



one thimj^ VAT will want to replace the PI variables in 34) with 
particular four digit numbers. Our easiest recourse at present 
is to have ^AT ask the user about each PI : 



35) V: WHAT IH TliK AGENT? 

U: PI-1234 

V: WHAT IS THE BENEFJCIARY? 

U: PI-1345 

V:' WHAT IS THE PATIENT? 

U: PI-1456 

whereupon VAT will replace 34) with: 



-36) VB-"GTVF." / "PAST" 
Pt-1234tA3T 
PI-134 5tru:N 
PI-1456tPAT 



At leant somh of, the answer^ to the questions in 35) ought, 
under some. circumstances, to be derivable from the context. Wo 

hppe gradually to teach VAT to discover such answers for itself 

• / 

4 

whenever possible* 

A second consideration at -.his point is to establish which 
PI IS the topic. Again the easy way out is for VAT to ask the 
user: V-^ 

37) V: WHAT IS THH TOPIC? 
U: PI-1234 

in English, at least, this may be the point at which functional 



40 

. . 4G 



relations such as aqent, beneficiary / land patient should be re- 
placed by surface syntactic roles like subject,^ indirect object, 
and direct object. (In Japanese the introduction, of particles 
like wa, c[a, o, and ni would be appropriate here%) Thus, after 

37) VAT may change the repre^sentation in 36) to: 

38) VB-"GIVE" / "PAST" 

-PI-1234rSUBJ ^ 
PI-1345flO • . 

PI-1456fDO 

» 

where 10 and DO stand for "indirect object" and "direct objpct" . 

' ' ' ' 

Again, tho identity jbf the topic will often be derivable from 
the context. For example, all other things being equal, tppics 
have a tendency to remai^ constant from .one clause to the next, 
agents are more likely toS;^ topics than patients, and so on. 
Considerabl<| empirical worfe4(ill be necessary before all such 
factors have been sorted out| 

If the codability of CC-1053 had been somewhat lower, and 
, the modified categorization exemplified in 28) had been chosen, 
the, representation at this stage would include an adverb (A.V) : 

'39) ,VB-"GIVE" / "PAST" / AV-"GRUDGING" 
f' PI-1234tSUBJ 
^ PI-1345fIO 
PI-1456*DO 

The lexicali'zation. of CC-1053, then, has involvf»d categor- 
ization, possibly modification, inflection, and framing. The 
next step in verbalization is to lexicalize the several Pis which 
are contained in a representation like A8) or 39). We will see 

41 

• - 47 



tha^ tho IcxicaMzation of a PI involves categorization, possi- 
bly modification, and inflection. Framing is for th6 most part 
restricted to the lexicalization of a CC. 



23 



42 



ERIC 



. . V. Loxicalizaticn of a Pi 

« 

a\i is the concept of a concrete object, be it animate or 
inailimate!^ or of an abstraction which has been reified and is 
being treated linguistically in ways analogous to the treatmer.t 
of physical objects. The surface linguistic representacion of 
a PI may be a proper noun, a common noun, a pronoun, or nothing 
at all. Furthermore, by, agreement processeJv certain-, features of 
the PI may be incoirporated into ,the verb with which it is asso- 
ciated. Each yanguage has its own idiosyncrasies in the treat- 
ment of PIS. Some, like Japanese, are especially fond of de- 
leting the PI /altogether whenever it is predictable from cOp= 
text. Some, ^f the polysynthetic type, seem to go overboard in 
the extent t6 which they incorporate features of the noun within 
the verb. Some make a point of adding inflectional fixtures ex- 
pressing "def initeness" , plurality, and the like to the surfac:* 
noun, while others seem to get along well without such cxpre-i- 
sion. For illustrative purposes we will confine ourselves in 
this section to tae main outlines of how a PI is" lexicalized in 
English. 

Much depends on whether or not the PI in question is 
"given" — whether it is a piece of knowledge that the speaker 
believes has already been brought into the addressee's con- 
sciousness in some way, prior to the uttering of the present 
sentence.^ Here again we have a case where the easiest course 



for VAT is t6, ask the user: 

\ • . . ■ ■ 

40) Vi IS Pi-X234 GIVEN? 

I • 

Certainly in many casoK, hcwovori VAT can bo »lauqlit. to docidt^ 
this for itself. If, for example , N;^I-1''2? 4 was mentioned in the 

precedi;|^ sentence the answer to 40) must be yes. If the pre- 

r 

ceding sentence was "Mrs. Brown came over from next door" and 
we are concerned with the lexicalization of PI'-1234 within the 
sentencn "PI-12J4 gave Tommy a cookie", the givenness of PI-1234 
will result in its lexicalization as "SHU". We can actually go 
a fair distance in establishing the givennoss of a PI on this 

basis alone, but the question of how else givenness is ,estab- 

I 

lished, including its introduction from knowledge *>vt£:r'nal to 



the linguistic tex : altogether, will need to be laised 
ally. ' 



□ventu- 



Le^ us^ar.sume first that the answer to 40) has 'been yes, in 
which case English is likely to lexicalize PI-i234 with a pro- 
noun. This i's not always the case; sometimes a PI that is. 
given will not be pronominalized. The principal criterion here 
seems to.be whether pronominal izat ion will produce ambiguity, 
and ultimately VAT will need to decide whether ambiguity will 
result. For now, however, we proceed on the assumption that a 
PI which is given will automatically be pronominalized. 

The procedure we are currently using for pronominalization 
in English asks first; * 

■ 44 

ro 



^ \ . . • • 

. \ 

\ • 

\ 

\ ' 

41) V: IS PI-1234 ^THE ADDRP.SSEE? 

This question is asked first because the t?roi}own "YOU" does not 
distinguish number, and if the answer to 4r) is yes it will not 
bo necessary tor VAT to do anything beyond le^icalizing ri-1234 
as NN-"YOU" (NN, of course, or "noun"). If, on the other hand, 
the answer to 41) is no, then VAT must ask; \ 

42) V: WHAT IS THE CARDINALITY OF PI-1234? \ 

We assume that a PI is from on'e point o*' view .the conqept of a 
set of objects, and that the cardinality of the set is relevant 
in establishing expressions of singularity and plurality, among 
other things. Actually the dis^t? notion between one and more 
than one as possible answers t*- 42) is all that is relevant at 
the moment. More i'nteresting questions do arise in this area. 
For example, with -j irdinalities up to about five there is -likely 
to be a need for aistinguishing each member of the set with a 
specific PI number, whereas with larger c^^rdinalities the set is 
likely to bo conceived of simply as containing "a number of" or 
"many" members. 

IC we assume first that the answer to 42) is one, thon VAT 
will ask: 

43) V: IS PI-1234 THE SPEAKER? 

If the answer is yes, then. PI-1234 is lexicjilized as NN-"I". If 
no, then we are dealing with a third person referent and VAT 

•15 

n 



must determine its gender: 

44) V: IS PJ-1234 ANTHROPOMORPHIC? 



This cidspification includes human boinqs, but also named ani- 
mais such as pets.^ If the answer to 44) is no, VAT will lexi- 
cali2ePI-1234 as NN-"IT". Otherwise it must find the sex of 
this ref eiTent^' ' 

s * 

45> V: IS PI-1234 MALE OR FEMALE? 

and lexicalize it as NN-"HE" qx NN-"SHE" accordingly. 

If the ar.s »er to 42) was a number greater than one, VAT 
must decide between "WE" and "THEY", the pronouns which are ex- 
plicit.ly plural. Essentia] ly it must aski 

46) V: IS THE SPEAKER A MEMBER OF PI-1234? 

If yes» it will^produce the lexicalization NN-";VE" and *^ no, 
NN-"TKEY". 

There are again a variety of wa^s in which VAT might be • 
able to answer questions like 41) through 46) without asking the 
user. The identity of speaker and addressee will have been es- 
tablished by providing such discourse parameters at the very 
beginning of the discourse; at present wo use the arbitrary 
convention that Pl-lOOl is the speaker and PI-1002 the addressee. 
In questions 41) and 43) VAT is asking whether PI-1234 is iden- 
tical to PI-1002 or PI-1001. But, depending on the context, 



46 



ERLC 



thi.s identity may already have bfeen ostablishod. . As fpr the 

m 

cardinality of f 1-1234, it may have been made explicit through 
a numera; or in some other way. And the gender of this referent 
might have been established through the previous use of a sex- 
specific proper name, or through some other fact that has al-* 
ready been supplied. . ^ 

Lot us now turn to the possibility tftat PI^1234 is not 
given — >that the answer to question 40) was In that case, 

lexical izat ion must be either in terms of a proper name, or 
through the use of a categorization and ultimately comit^on noun, 
VAT first asks: ' . 

47) DOE? PI-1234 HAVE A NAME? | 

« 

If yes, the user gives the name and VAT lexicali2es PI-12 34 as 
NN-"JOHN" or the li^o. The real situatio.i is not quite this 

\ 

simple, since a PI is likely to have more than one proper name 
(John, Mr. Brown, Daddy, etc.)' and the choice of which, if any, 
among thorn to use will depend on various interpersonal consid- 
erations, r.vontually our program should include questions rele- . 
vant to such a choice. 

1^ the answer to 47) is no, then VAT follows a procedure 
roughly analogous to that associated with the categorization of 
a CC; 

48) V: iiOW IS PI-1234 CATEGORIZED? 

t 

U; TEACHER 



(for example). Basically, at this point, VAT will replace PI- 

1234 with NN-**TEA.CHER" . At the , same time it will store the 

statement: 
* 

49) Pi-1234 .C> UC-"TEACnER" 

and will look "at the lexical entry for this category for what- 
ever relevant information is stored there. 

/ ■ ■ " 

Just as a CC ir*ay be given a lexicalization that i^ inflec- 
ted' for tense and/or aspect, the lexica^zation of a PI may be 
ciiven inflections such as number and/or def initeness. If the 
lexicon shows, for example, that UC-"TEACHER" entails that PI- 
1234 is countable, VAT will also in this »caso ask about' its car- 
dinality, as in 42) above. If the answer i%ia. number greater 
than one, VAT will create a rcprosontation lilco NN- "TEACHER" / 
"PLURAL". Independent o\ this number question, VAT will need to 
determine whethejr the use \of this category in this^ context will 
enable the addressee to know what particular instance of the 
categoiy is being talked about. We put this in terms of the 
question : 

50) V: DOES UC-"TEACHER" IDENTIFY PI-1234? 

/ 

If yes, VAT will add the definite article (AR) as an inflection: 

y ■ 

NN-"TEACIIER" / AR-"THE" . If no — that is, if the addressee is 
assumed not to be able to identify a previously known PI as thk 
referent, VAT will decide between the indefinite articles AR-".A^ 
and AR--"SOME" depending on whether the cardinality of PI-1234 

48 

^1 



one or qreatqr than one. The outcpme will thus be either liJN- 
"TEACHER" / AR-"A" or NN-"TEACUEJi" / "PLURAL'"* / AR-"SOME" ; that 
is, "a t,eacher" or "some teachdrs". We have attempted to for- 
malize some of the contextual grounds on which VAT will be able 
to answer a question like 50) without asking the -user-, and this 
matter will be discussed! in section VII, below. 

< 

I 

\ 

\ 

v 



/7 



49 



VI. The Lexicon 



^ ^In ail its operations VAT must . at many^ points make access 
to. ^ store of more or less permanen lexical knowledge which wo 
have formalized in terms of entailments of categories. The 
'statements in the lexicon specify what we know- about a partic- 
/ular CC or PI as a result of its being identified as an instance 
of a certain category. Or, to look at it from the opposite 
point of view, these statements say what properties a particular 
CC or PI must have in order to be cafiegorized in a certain way. 
From the first point of view we can say that once we know- that 
a particular CC has been categorized as an instance of UC- 
"GIVL;", for example, the lexicon tolls us a number of other 
thin<js that we must know nbout thi^s CC- I-'rom tho AoOond point 
of view we can say that the Icxic/al entry for UC-"GIVE" tells 
us what we must knOw about a CC in order to assign it to this 
category. These twc ways of viewing lexical entries are not in 
contradiction^ but are different sides of the same coin. 

From a psychological standpoint the lexicon 4;pproximate^* a 
description of everything that^ is involved in a person* s inter^ 
protation oZ the worlds at least so far as his interpretive grid 
is dependent on verbal categories. We are unable^ of course^ to 
focus on individual differences^ but must attempt to deal with a 
core that is common tq the speaker 6 of a particular language. 
The lexicon is the heart of our program^ whether we are engaged 

50 



IC 



in vcrbaliication , translation^ or parsing^ and everything else 
depends on the success with which the lexicon has been elabo- 
rated. A separate Ic^xicon has to be developed for each language 
with wliich th<.» program trios to <ioal. In a 'full- fledged impU^- . 
mentation certainly a very high proportion of the total devel- 
opmcntai effort will have be devoted to iQxical'questions. 

As a simple illustration of the kind of in format ion a lex- 
ical entry might contain, as. well as of the formalism we have 
been using to represent such information, let us consider at 
least part of what it means for a particular CC to be cate- 
gor'ized as an instance of UC-"LIFT". We will w<int to say that 

when X lifts Y, this entails that X dot-, something which causes 

» 

a change of state from Y being in one location' to Y being in 
another location, and furthermore that the nev; location i3 above 
the old location. The lexical en^ry for UC-"LIFT" , insofar as 
it captures this ir.'ich information, is written as follows: 

SI) CC-A C> UC-"LIFT'* 
E> 

CC-A F> VB-"LIFT" (PI-PfAGT, PI-CfPAT) 

CC-A S> ' CJ-CAUSE (CC-D, CC-E) 

CC-D F> VB-ACT "(PI-B) • 

CC-2 S> CJ-CONJUNCTION ( (CJ-CHANGE (CC-F, CC-G) ) , CC-H) 

CC-F F> VB-AT (PI-C, PL- I) 

CC-G F> VB-AT (PI-C, PL-J) 

CC-H F> VB-ABOVE (PL-J, PL- I) 

The first two lines are to be^read, "If CC-A is categorized as 
an instance o*" UC-"LIFT", this entails..." The first line under 
E> then gives the case frame, saying that there will be a clause 
containing the verb "LIFT" accompanied by an agent (PI-B) and a 



•j1 



% . 9 



I 

I 



patient (PI-C) . The second line under E> says that it is alter- 
natively possible tQ subconccptualize CC-A in a certain way, 
which amounts to a paraphrase. That is, although the speaker 
has chosen not to subconceptualizo CC-A further (presumably be- 
cause the choice of UC^WFT" has been judged to provide the 
right packaging for CC-A) , if he had decided to subconceptualize 
further he could have done it in the manner specified in this 
line, where two new CCs, CC-D and CC-E, are joined by CJ-CAUSE. 
In other. words CC-D is conceived of as causing CC-E. The third 
line under E> says something about the content of CC-D, namely 
that it involves an act by PI-B". (It may be noted that the ab- 
sence of quotes around ACT in VB-ACT indicates that this is not 
a conceptual unit that will load to a direct surface structure 
representation, as will VB-"LirT".) The fourth line under E> 
says that CC-E, which is caused by this act, can be subconcep- 
tualized into two conjoined olcracnts. Thi first of these is a 
CHA4NGE from CC-F to CC-G, and the second is CC-H. The fifth and 
sixth lines under E> specify the nature gf the prior and sub- 
sequent states, CC-F and CC-G. Both involve PI-C being at some 
location, first PL-I and then PL-J (PL standing for "particular 
location"). The last line elucidates CC-H, stating that the new 
locat.;.on (PL-J) is above the old location (PL-I) . Thus 51) has 
captured formally the several bits of kiiowlcdgo about CC-A that 
were summari^.ed discursively at the beginning of this paragraph. 

Let us now turn to a more complicated .jxample. This exam- 
pie came up initially as a result of the observation that the 



Japanfiae verb kasu can be translated into English as either 
rent (out ) or lend . In other words this verb is nonspecific as 
to w^iether the agent does or does not receive money for the 
goods or services he provides. We. were inte'rested in how a 
translation from Japanese into* English would decide whether to 
use rent or lend where the Japanese had used kasu . This problem 
led us to consider lexical entries for several verbs ilivolving 
transfers and transactions, and we arrived at a system of cross* 
referencing , and embedding within lexical entries that captures 
the content of abstract notions (such as transfer and trans- 
action) at the same ^ime that it links entries one to another in 
a way that is generally useful. 

We may begin by defining a transfer. We assume a category 
UC-TRANSFER which, since it does not contain quotation marks, is 
understood to be abstract and not immediately convertible into 
a surface structurp verb. The lexical Gc.':rry^eads as follows: 

52) CC-A C> UC~TRANSFER 
E> 

CC-A CJ-CHAN'GE (CC-B, CC-C) 

CC-B F> \/B-HAVE (PI-D,' PI-E) 

CC-C F> VB-HAVE (PI-F, PI-E) 

Discursively, a CC-A which has been categorized as an instance 
of UC-TRANSFER can alternatively be subconceptualized (or para- 
phrased) in terms of a dhange from CC-B to CC-C, where the for- 
mer Involves PI-D "having" PI-E, and the latter involves another 
party, PI-F, having PI-E. In other words, a transfer involves a 
chancje in the having of some object (PI-E) f rom ^^ncjji^i vidua 1 




S3 



I 



to another. The Emjlish word have of course performs a variety 
of semantic functi.ons; our usii aj it in this formalism is meant 
to include at least two varieties of having — ownership, which we 
will label HAVE-pWl^, and having the use of something, ^ 
■ which we will call IIAVE-USE. Simple 4IAVE, .^^s. in 52), is meant 
I to t)G nonspecific as to which of those varieties (^f having is 
involved, as may bo accounted for with the" following two state- 
ments: 



53) CC-A C> UC-IiAVE-OWJJ 
CC-A C> UC-HAVE 



CC-A C> UC-HAVE-USE 

E> • * ' 

CC-A C> UC-HAVE 

One example of a transfer is the kind which is categor- 
izable with lJC-"GIVi:" , who«e loxical entry can bo given as fol- 
lows : ' ' 

54) Cl-A C> UC-"GIVF." 
E> 

CC-A F> VB-"GTVE" (Pl-htAGT, ?PI-CfDEN, PI-DfPAT) 
CC-A C> UC-TRANSFER 

PI-D = PI-B , /J 

PI-F ■= PT-C 
PI-E = PI-D 

^hat is, a CC whicli has been cateqori. od as an instance of, UC- 
"GIVE" has the case frame shown in the first lino under E>. The 
question mark Lefore the boncf icidi^ indicates that it 'is op- 
tional; one can say "Roger gave a book" without mentioning a 
btneficiary. The second line under E> shows that this.CC can 

54 



ERIC 



also be categorized an instance of UC-TRAtiSP£R. This fact 

* • 

meanS'that the CC also 'has the entailments listed in' 52).^ Since 
the variables within each lexical entry are arbitrarily labeled 
A, B, C, etc., it is necessary now to statu equivalences between 
the variables in the entry for UC-"GIVE" and those in the entry 
for UC-TRAIiSFER. These equivalence^ are listed, indented, in 
the last, three lines of 54). They are to be read, "PI-D of the 
TRANSFER entry is equivalent to PI-B of the "GIVE" 'entry (the ... 
gTver); ^PI-F of the TRANSFER entry is equivalent ,to^-PI-C pf the 
"GIVE" entry (the givee) ; and PI-E of the TRANSFER entry .s 
equivalent to PI-D of the "GIVE" entry (the given)." In this 
v/ay 54) and 52) arp brought into the correct alignment. 

t Another, more complicated kind of transfer is that involvftd 
in the category UC-"LEND" : 

55) CC-A C> UC-"LEND" 
E> • 

CC-A F> VB-"LEND" (PI-BfAGT, ?PI-CtBEN, PI-DfPAT) 
•CC-A C» UC-TRANSFER 
•PI-D = PI-B 
PI-F =•■ PI-C 
PI-E = PI-D 
CC-B = CC-E 
CC-C = CC-F 
CC-E C> UC-IIAVE-USE 
CC-F C> UC-HAVE-USE 
I VB-HAVE-OIVN (PI-B, PI-D) ** 
^^CC-A -C> UC-TRANSACTION 

The first seven lines of this entry are entirely parallel to the 
entry for UC-"GIVE" in 54) . It then becomes necessary to refer 
to the earlier and later states, CC-B and CC-C, of the TRANSFER 
entry. These are equated with CC-E and CC-F of the "LEND" 

55 



ERIC 



entry ♦ It is sai4 that bofcli of these states involve HAVE-USE* 
That is, when X lends an objeo^^to in the earlier state X has 
use of the' object and in the later state Y does. The next to 
last line says that PI-B, the aijont of the lending, maintains 

r 

ownership of PI-D throughout, ^he. last line says that CC-A 
cannot be categorized as a transaction, as explained below. 
Evidently the only difference between 55) and the entry for 
UC-"KAS-" (i.e. kasu ) in Japanese 'is that for the latter the 
last line of 55) is missing. Thus, kasu leaves it undecided 
whether a transaction was involved or not. 



What, then, is a transaction? Essentially it is a linking 

of two transfers, where one of the transfers is for the purpose. 

of the other. In buying, for example, a typical transaction, 

the buyer gives money to the seller so that the sollor will give 

him soirtc object in return. With buying, a change of ownership 

Is involved in bcth transfers, but that need not be the case. 

With renting, for example, there is a change of ownership of the 

money, 'but only a change of use of the objec*-. We define a 

/ 

transaction as follows; 



56) CC-A C> UC-TRANSACTION 
1> 

CC-A r.> CJ-PURPOSE (CC-B, CC-C) 
CC-B C> IJC-TRAMSFER 

PT-n = PT-D 

PI-E ^ pi-i: 

PI-F = PT-F 
CC-C C>^ UC-TRANSFER 

PI-F = PI-D 

PI-E = PI-G 

PI-D = PI-F 



^ 56 



ERLC 



The first lino under E> states that CC-A can be paraphrased in 
terms of CC-B and CC-C, the former being for the purpose of the 
latter* CC-B is a transfer in which PI-D (w.g. the buyer) 
transfers PI-E (e.g. money) to PJ-F (i^.g. the seller). CC-C is 
a transfer in which the roles of PI-D and PI-F (and hence their 
relation to the variab/es in 52)) are reversed. Furthermore, 
tho object transferred (e.g. the thing bought) is a different^ 
one — here PI-G. 



Besides buying and selling, another typical transaction is 
'renting. The English word rent is ambiguous, and we will illus 
trate here the entry for what w^ call UC-"REIJT-2" , which is 
rc .-nting out (German vormieten) ; 



57) CC-A C* nC-"RENT-2" 

E> . 
CC-A F> VH-"RENT" (PI-BtApT, ?PI-CtBENi ?PI-DfMSR, 

PI-EfPAT) 

CC-A C> UC -TRANS ACT I ON 

PI-F = PI-B 

PI-D PI-C 

PI-E = Pi-D 

PI-G = PI-E 
. CC-B = CC-F 

CC-C = CC-G 
CC-F C>' 1 UC-TRANSFER 

CC-B = cc-n 

CC-C = CC-I . . 

CC-G C> UC-TRANSFER 

CC-B = CC-J 

CC-C = CC-K 
PT-D C> UC-MEDIUM-OF-EXCIIANGE 
CC-ll C> UC-HAVE-0\VN 
CC-I C> UC-HAVC-OWN 
CC-J C> UC-HAVE-USE 
CC-K C> UC-HAVE-USE 
VB-HAVE-OW>I (PI-B, PI-E) 



Tho first line unJer E> gives the case frane, which includes two 



57 



ERLC 



oblitjatory cases, an agent and a patient ("Bill rented (out) his 
lawnmowor") and an optional beneficiary and measure (MSR) ("Bill 
rented his lawnmower to Tom for five dollars"). The second line 
under E> says that CC-A is a ^transaction; it thus conforms to 
56) and it is necessary to state the equivalences between the 
Pis in 57) and those in 56) . Below these PI equivalences it is 
also stated that the CC-B of thcf TRANSACTION definition (the. 
transfer of money) is equivalent to CC-F of the "Ri:iNT-2" defi- 
nition, while CC-C of the TRANSACTION definition (the transfer 
of the object) is equivalent to CC-G of "RENT-2" . The two 
states of the first TRANSFER are named CC-H and CC-I,- while the 
two states of the second TRANSFER are named CC-J and CC-K. It 
is then said that the measure, PI-D, must be something categor- 
izablc as a MEDIUM-OF-EXCHAiNGE--normally money, but potentially 
anyLhihg that would perform this function. The two states of 
tho first TRANSFER are then both :iaid to bo instances of UC- 
* HAVE-OWN, since tho money actually changes ownership. The two 
states of the second ^transfer , on the other hand, are instances 
of tJC-UAVE-USE, since the object does no^ ahange ownership, blit 
only use. The last line, like the next to last line of 5b) , 
says that the agent of the renting retains ,ownership of the 
object. 

(T ■ 

^Tt was mentioned that the lexical entry for Japanese UC- 
"IC.S-" is the same as that for English UC-"LEND" , ar, in 55), 
except that the Japanese entry lacks the last line of 55) in 
which it is stipulated that lending cannot be a transaction. It 



can now be seen that UC-"1<AS-" is compatible with both 55) and 

57) ^^ We thus have a formal explanation for the fact that kasu 
may bo tran"slated as either l<^iid or rent * In order to decide 
between the two tran:^ lations , it ir. noceasary to search the con- 
text in which this CC occurs to discover whether it is or is not 
a transaction. We will return to this matter in our discussion 
of translation in section VIII. . 

■ 

Lexical entries for categories whose instances are Pis are 
dcsi-gned to elucidate the knowledge which is entailed by the 
assignment of a particular PI to some category. Such entries 
do not contain a case frame, but are otherwise similar in format 
to the entries for categories whose instances are CCs, as de- 
scribed above. As simple example, we may note that when a PI 
i« catecjorized as an instance of UC-"CAR" there is an entailment 
that tnis PI v/ill "have" a trunlc. This kind ©f having is dif- 
ferent from those discussed in connection 'with transfers and 
transactions in the last section; we represent it with HAVE-AS- 
PART : 

58) PI -A C> UC-"CAR" 
E> 

VB-HAVE-AS-PART (PI-A, PI-B) . 
PI-B C> UC- "TRUNK" 

It is useful here (and elsewhere in the lexicon) to 'distin- 
guish between necessary entailments and expected entailments or 
deFauI'- option^. The latter oonstitute knowledge that is nor- 
mally entailed by- the category, but not necessarily so. We in- 

59 



dicate entailments of this sort with a pVefixed "£:"• As an 
example wc may note that something which has been categorized 
as a MEDIUM-OF-EXCHANGE (cf. 57)) is normally expected to be 
money, although in some circumstances it might be cowry shells 
or wampum: 



59) PI -A C> UC-MEOIUM-OF-EXCHANGE 
E> 

E: PI-A C> UC-"MONEY" 



A more complex example involves the categorization of a PI 
as an instance of UC-"BEAGLE", In this case we know that the PI 
is also categoriz^ble as an instance of UC-"'DOG" , that we may 
expect that it will have a tail (although somu dogs do not) , that 
that it-will bark, and that it will chase cats: 



60) PItA c>» uc-"beagli:" 

E> 

PI-A C> UC-"DOG" 

E: VB-HAVE-AS-PART (PI-A, PI-B) 

PI-B C> tJC-"TAIL" 

E: VB-BARK (PI-A) 

E: VB-CHASE (PI-A, PI-C) 

PI-C C> UC-"CAT" 



It may be th£it E: should be expressed as a probability; 
that is.,, that there is a co^inuous range over which we may ex- 
pect something to be entailed/^ with necessary entailment being 
one extreme. At least for practical purposes, however, it 
proves useful to make a three-way distinction between necessary 
entailments (unmarked), default expectations (E:), and a third 
type which we call optional entailments and ipark with "0:". 
Those last represent a lower degree of probability; they are 

60 

- 6G 



entailments which are neither necessary nor expected/ but. which 

i 

are easily possible. For exairple/ a bicycle need not have a ' 

basket and is not expected to have a basket / but it may very 

well have one: 



61) PI-A C>* UC-"BICYCLE" 
E> 

0: VB-IIAVE-AS-PART (PI-A, PI-B) 
PI-B C> UC-" BASKET" 



The distinction between necessary or expected and optional en- 
tailments is of interest when it comes to the assignment of def** 
■initenoss, as discussed in the following section. 



VII. Discourse Information and Readjustments 

f 

\ 

A speaker neads access to throe major classes of informa- 
tion in order to verbalize sucessfullyw Firsts of course^ he 

r 

must have an idea of what he wants to talk about: the contenw' 
of the verbalization. Second, he must have access to gendral 
knowledge that is relevant, the kind of kn^wl^^ge that we. are 
attempting to characterize in the lexicon. But timers is a third 
kind also. The speaker must keep track of knowledge having tc 
do with the very fact that he' is verbalizing: knowledge about * 
the speech act itself, and its effect on the person his verbal- 
ization is addressed to. It is this thixd kind of knowledge 
that we are calling discourse information . We are concerned in 
this area with such factors as the identity and social r^elation- 
ship of the speaker and the addressee, the time and place of the 
speech act, and factors which relate the content of the dis- 
course to what is assumed to be going on in the mind of the ad- 
drGSseo. Sometimes, moreover, it is important to ka*^p t-rn.~lc of 
the act of verbalization as an event in itself, since the ver- 
balization may be talked about or referred to subsequently in 
the discourse. Discourse information is kept by VAT in tempo- 
rary storage. Unlike information in the lexicon, it is specific 
to and oven changeable within a particular discourse rather than 
be>hg potentially applicable' tn an unlimited number of different 
discourses. 



G8 



ERIC „ 



Our treatment of discourse information, is still ]?udimentary 
ahd uneven. So far as speaker and addressee are concerned, we 
simply enter into discourse information storage statemrnts like 
the following: 

62) SP-SPEAKER (PI-1001) • , . 

•SP-ADDRESSEE -(PI-1002) 

r 

(The prefix SP stands for "system predicate".; it is used for a 
variety of predicates associated with discourse information.) 
The program makes use of this information in various ways. For 
example, in deciding how to lexicalize PT-1001 and PI-1002 VAT 
of information like that in 62) in order to answer 
like 41) and 43) in section V above. * 

i 

Probably in most languages to some degrq-e, but especially 
in many Asian languages, the social relationship between the 
speaker and addressee plays a ro'le of some kind in verbalization, 
ive have been interested in introducing such considerations into 
our verbalization procedure, and have so faj: concentrated on the 
question of how VAT should decide to categorize in Japanese a PI 
which in English would be categorized as an instance of UC- 
••GIVE". There arc several categories in the Japanese le>(;.icon, 
all of which conform to the definition of UC-"GIVE" in 54) above/ 
but which ditter from each other with, respect to the speaker- 
addressee relationship. How the choice can be made is most 
easily xllustiaLed in the context of a translation procedure, 
and we will return to this example in the section IX. 

60 




VAT does little at present with considerations of khe time 
and place of the speech act. Statements like the folloviing can 
be included with discourse information: \ 

■\ 

63) SP-HERE (PL-1357) . ' ^ 

SP-NOW (PT-1579) ! 

I 

(where PL stands for "particular location" and PT for "partic- 
ular time"). Whether PL-1357 and PT-1579 remain throughout the 
discourse or are replaced .by other places and times depends on 
the nature of the discourse itself; sometimes there will be 
significant changes in these parameters and sometime? nplt. In,, 
any case it is possible for VAT to -answer questions about tense^ 
for example, by asking whether the time of a CC that is being 
verbalized is before or after, or whether it includes, the. time 
which has been specified as NOW, such as PT-1579 in 33). 

Discourse information is subject to b^ange as the discourse 
proceeds. The way in which VAT presently accomplishes such 
changes is through readjustment processes, applied immediately 
after each sentence has been completely verbalized. These re- 
adjustments specify the ways in which the store of discourse 
information has been affected by the sentence. One of them, for 
example, creates a CC which is the c.-ncept of the event of pro- 
ducing the sentence itself, which subsequently can be treated 
like any other event. Everything involved in the verbalization 
of that sentence belongs to the content of this CC. If, for 
example, the speaker subsequently has reason to repeat what he 

64 

- 70 



oriqinally said, ho may verbalize in exactly the same way (quote 
himsclt directly) , or he may "say the same thing in different 
words" by making different choices in categorization and so on. 
The relevant information is available within the CC that reptG- 

sents the original verbalization. 

I 

^ Another readjustment has to do with the establishment of 
"givonness" for items communicated in the sentence. For each 
PI-A, for example, there will be, when the sentence has been 

♦ 

completely verbalized, a readjustment process stateable as: 

64) SP-GIVEN (PI-A) 

If, for example, the sentence in question was "Mr$« Brown gave 
Tommy a cookie", and Mrs. Brown, Tommy, and the cookie are PI-^ 
1234, PI-1345, and Pl-1456 respectively, then readjustments 
after the production of this sentence will create the state- 
ments : 

65) SP-GIVEN (PT-1234) 
SP-GIVEN (PI-1345) 
SP-GIVEN (PI-1456) 

If any or all of these Pis occur in the next sentence, they will 
be pronominal ized, and it will not be necessary for VAT to ask 
the user a question like 40) above. Thus, the next sentence 
might be "Ho took them from her gratefully." 

It is difficult to decide when statements like those in 65) 
should be deleted from the store of discourse information--when 



givenncss evaporates. After^ a certain period of time has . 
cxapsed in which the PI has not been talked about or otherwise 
kept in the addressee Vs qonsciousness , the speaker will ..probably 
no longer pronominalize it. At present we- let statements like 
those in 65) remain only through the following sentence. Thus . 
if PI-1234, for example, does not occur in the next sentence it 

t 

>will not be treated as given two sentences later, and will not 
Ijje pronominalized. Not all discourse works in this way, but 
\hi& device provides a useful temporary approximation. 



\ . A rather similar kind of readjustment has to do with the 



es^tablishment of a relation between a UC and a PI which we call 
SpV-IDENTIFI;es. The presence of this relation eventually leads 
to \the Ibxicalization of the PI with the definite article. Sup- 
post^ the speaker says "I bought a bicycle yesterday." During 
the verbalization of this sentence VAT will have created the 
statei^icnt : 

66) PI-1987 C> UC-'*BICYCLE" ■ 

That is, PI-1987 has been categorized ^ as -an instance of UC- 
"BICYCLE". This statement thon triggers a readjustment process 
which creates the discourse information: 

67) SP-IDENTIFIES (UC-"BICYCLE" , PI-1987) 

which means that vi)nen he is presented with something that is 
lexicalized as an instance of UC-"BICYCLE" , the addressee can bo 
expected to know what particular instance it is (in this case 



66 



/ rw 



ERIC 



P1-19B7). Whon, during a later sentence, VAX comes to the^ues- 
tion: / 

68) V: DOES UC-"DICYCLE" IDENTIFY PI-1987? 

as in 50) above, it is in a position to provide its own answer 
without rf^covirse to^ the user. Thus it will, on its own initi- 
ativo, lexicalize PI-1987 with the definite article: NN-"BICY- 
CLE" / AR-"THE". It is in ways such as this that we are at- 
tempting to" increase VAT's ability to answer its own- questions.. 

As in the case of givenness, the question arises as to when 
a statement like 67) should be deleted from the^iktore'oT dis- 
course information. All .that is clear now is that such state- 
ments generally la^ longer than SP-G'IVEN statements, and for 
the moment wo do notMclcto SP- IDENTIFIED; statements before the 
end of the discourse/ It is undoubtedly the case, however, that 
some of them should^ be deleted sometimes, and we v/ill also need 
to >doal eventually with discourses in which there are multiple • 
instancies of the same category: "the first bicycle, the second 
bicycle^ etc." 

The presence of lexical information of the type that was 
described at the end of section VI has ah interesting and de- 
sirable effect on readjustments, specifically with respect to 
statements like 67) . As an example, we might have a lexical 
entry for UC-"BICYCLE" which includes: 

69) PI-A C> UC-"BICYCLE" 

67 



E> 

VB-nA\A;-AS-PART (PI-A, PI-B) 
PI-B C> UC-"FRAME" 

O: VD-HAVE-AS-PART (PI-A, PI-C) X 
PI-C C> UC-" BASKET** 



That is, something categorizqd as an instance of UC-"BICYCLE" 
has as a necessary part some:,thing catcgorizable as an instance 
of UC-"FRAME", and also has as an optional part somethin^cate- 
gorizable as an instance of UC-"BASKET". Now, it may be noted 
that the second line under E , which deals with the categori- 
zation of PI-B, is a statement like that in 66) above. -After a 
sentence like "I bought a bicycle yesterday" has been produced, 
this line will therefore trigger, a readjustment process which 
creates the statement: 

« 

70) SP-IDENTIFIES ( UC-" FRAME" , PI-1468) 

(with whatever number it is appropriate to assign to this PI). 
As a consequence, if PI-1468 occurs in a subsequent sentence it 
will be lexicelized with the definite article, as in "The frame 
is extra large." Thus, as is desirable, definiteness is created 
hot only for instances of the category first mentioned, but also 
through entailments of that category. It should also be noted 
that in this context it is a little odd to say "The basket is 
extra large" , talking about PI-C. One would be more likely to 
say "It has a basket which is j^»xtra large", or in some other way 
to introduce the .oasket explicitly. In other words the process 
just described works better for necessary parts than for option- 
al parts of the first-mentioned object (PI-A) . V^le therefore ex- 



68 



ERIC 



elude from this readjustment process Pis that have been intro- 
duced through optional entailments. • . 



69 



ERIC 



VIII. Translation 

. ' The general, nature of the translation procedure was out- 
lined in section I, and diagramed in Figure 1. To summarize 
again, VAT will start with a text in the source language, will 
reconstruct the verbalization processes which produced that 
text, and will then itself produce a parallel verbalization in 
the target language. During this las.t procedure it will apply 
syntactic processes appropriate to the target language whenever 
it can, but at each of those many points where it must make a 
choice of some kind it will look across to the source language 
verbalization to see what choice was made there. If possible it 
will equate that choice directly with a corresponding choice in 
the target language. If no direct correspondence is mailable, 
it will compare the lexicons of the two languages to determine 
what correspondences are possible, and will then search the con- 
text to decide which of them should be chosen. We will be par- 
ticularly concerned in this section with illustrating a case in 
which such a complex choice must be made--in which the zigzag 
arrows in 'Figure 1 have considerable content. First, however, 
it my be useful to provide a framework by illustrating a rel- 
atively simple case where the correspondences are more direct. 
We will use as our first example the following brief text from 
Japanese: 

70 



71) Reiiooko o utta. Okane ga hituyoo datta kara. 
rof ritjorator sold money needed was because 

We will want to consider some of the procedures VAT will follow 
in translating this sentence into English: 

72) I sold the refrigerator. I needed the money. 

Actually our attention in this example will focus on the first 
sentence, since we will later want to consider the complications 
that are added by changing the verb in the first sentence from 
utta 'sold' to kasi ta 'rented' or 'Ifent'. 

Let us first review the manner in which VAT will recon- 
struct the original verbalization of the Japanese text* Since 
our eventual' parsing component will follow a kind of "analysis 
by synthesis" procedure, we will also be suggesting here the 
steps of the parsing program. The only difference, and of 
course it is a big one, is that for the moment VAT will ask that 
decisions bo made oy the user instead of itself deriving them 
from /the text together with its own knowledge of the world. The 
conversation with the user will proceed as follows: 



1 . 


V: 


WHAT VAT TASK DO YOu WANT PERFORMED? 

'Si- 


2. 


U: 


• 

VERBALIZE "cC-2001 


3. 


V: 


WHAT IS THE GENRE? 


4. 


Li : 


OPEN 


5. 


V: 


CAN CC-2001 BE CATEGO.<IZHD? 


6. 


U: 


NO 



7^ 



7. V: HOW IS CC-2001 SUBCONCEPTUALIZED? 

8. U: liEASON (CC-2002, CC-2003) 

As explainod for example 9) in section II, and with the proper 
insertion of periods, VAT's representation now is: 



CC-2003 

CJ-"." 

CC-2002 

CJ-"KARA" 

CJ-"." 



9. V 



10. U 



11. V 



12. .U 



CAN CC-2003 BE CATEGORIZED? 



YES 



IS CC-2003 GENERIC? 



NO 



VAT's representation, as explained for example 32) in section 

r 

IV, now includes: 

S 

CC-2003 / "PAST" 

- • • • 

13. V: HOW IS CC-2003 CATEGORIZED? . 

14. U: UR- 



VAT finds UC-"UR-" in the Japanese lexicon 
lines of this entry are: 



The first three 



CC-A C> UC-"UR-" 
E> 

CC-A F> VB-"UR-" (PI-BtAGT, ?PI-CfBEN, ?PI-DfMSR, 

PI-EtPAT) 



ERIC 



7a 



72 



As in example 34) in section IV, VAT creates the representation: 



VB-"UR-" / "PAST" 

PI-BfAGT 

?PI-CfBEN 

?PI-DfMSR • 

PI-EtPAT 
CJ-"." 
CC-2002 
CJ-"KARA" 
CJ-"." 



Since th'e beneficiary and measure Pis are optional, VAT next 
asks : 



IS. V: IS THE BENEFICIARY EXPLICIT? 



16. U: NO 

17. V: IS THE MEASURE EXPLICIT? 

18. U: NO 

The next two questions are: 



19 
20 
21 
22 



V: WHAT IS THE AGENT<\^ 

U: PI-2001 

V: WHAT IS THE PATIENT? 

U: ?I-2003 



VAT how has the follow^g representation (cf. 36) above): 



VB-"UR-" / "PAST" 
PI-2001fAGT 
PI-2003fPAT 

CJ-"." 

CC-2002 

CJ-"KARA" 

CJ-"." 



73 

79 

ERJ.C 



VAT next asks; 



23. V: WHAT IS THE TOPIC? 

24. U: PI-2001 



whereupon for Japanese it presently creates the structure: 



VB-"UR-" / "PAST" 

PI-2001 / "GA" 

PI-2003 / "0" 
CJ-"." 
CC-2002 
CJ-"KARA" 
CJ-"." 

» • • 

* 

I. 

VAT is now at a point where it can lexicalize PI-2001. and PI- 
2003. Beginning with PI-2001, it might ask first: 



25. V: IS •PI-2001 GIVEN? 

26. U: YES 



In fact, however* we assume that the speaker (and addressee) are 
automatically given, so that VAT contains a general entailment 
to the effect that: 



SP-SPEAKER (PI-A) 
E> 

SP-GIVEN (PI-A) 



Since by convention PI-2001 is the speaker, the following is al- 
ready stored as discourse information: 



SP-GIVEN (PI-20C1) 
Thus VAT was able to give an affirmative answer to question 25 



■ ^ r . ■ • / 

a complex matter, dbpcndinq in part on social relationships, and 
v/G havo not as yet constructed a procedure to introdure the 
correct pronoun for a PI that is given. We have, however, taken 
advantage of,, the simple fact that given Pis are very often de- 
leted, v</ith no surface representation at all. In the /present 
exampl.j, and in many others, the simple deletion of s^ch a PI 
produces the correct .result , so that an affirmative yknswer to 
question 2 5 load^ to the representation: 



V3-"UR-" / "PAST" • 
PI-2003 / "0" ^ 
CJ-"." 
CC-2002 
CJ-" KAIIA" 



CJ-"."" 



/ 



VAT now turns its attention to PI-2003: 



27. 

28. U 

29. V 
3 0.. U 

31. V 

32. U 



IS PI-2003 GIVEN? 

DOES PI-200 3 HAVE A NAME? 
NO 

iiOW IS PI-2V13 CATEGORIZED? 



RLIZOOr.O 



/ 



(U'G oir.ic hero considerations of cardinality.) The representa- 



tion now lo: 



VB-'^UR-' / "PAST" 



SI 



NN-"REIZOOKO" / "0" 
CJ-"." 
CC-2002 
CJ-"KARA" 
CJ-" ." 



The first three linos of the above are actually as far as wo go 
at the present time in the surface representation of a sentence.- 
We try to include in such a representation everything that is 
needed to arrive at a correct linear sequence of words. In this 
case the combination VB-"UR-" / "PAST" will yield the surface 
word utta, which will be placed in sentence-final position (fol- 
lowed by the. period) . That leaves reizooko o as ^ti6 first words 
in the sentence. 

VAT would next ask abput CC-2002, but we y^ill not c^rry the 
verbalization process further hero. We are interested in how 
just this much of the text will be translated into English. By 

and large VAT will ask the same questions it asked in the course 

! 

of the Japanese verbalization. It will look for the answers in 
the answers thatl were given there, and when possible will apply 
cofrcspondinq answers in Cnqlish. Along the way, whenever ap- 
propriate, it will apply syntactic processes that are called for 
by the structure of English. The translation, then, begins with 
the same question that began the verbalization in Japanese: 

V: WHAT VAT TASK DO YOU WANT PERFORMED? 

The answer given in line 2 above was VERBALIZE CC-2001. The 
En^^ish translation must use its own four digit numbers'; in 



76 



what follows wc will simply substitute the English digit "l" for 
the Japanese digit "2" : 



U: VEHDALI2L: CC-lOOl 



Of course here as elsewhere this question is not actually asked 

of the user, but is answerdd internally by VAT. The next ques- 

I 

tions exactly parallel lines 3-8 above: 



V: WHAT IS THE GENRE? , 
'U: OPEN 

V: CAN CC-1001 BE CATEGORIz|:D? 
U: NO . 

V: HOW IS CC-1001 SUBC0NCEPTUALI2ED? ' 

U: REASON (CC-lOOa, CC-1003) ^ ' • \ ' 

N 

.9 

We assume that English would not in this case use the wgrd - 
because, but simply juxtapose the two sentences, as in example 
8) in section II. Thus the representation now is : - 



CC-1003 
CJ-" . " 
CC-1002 
CJ ." 



Lines 9-13 of the Japanese verbalization have a direct corre 

spondence : 

V: CAN CC-100 3 BE CATEGORIZED? 
U : YES 

V; IS CC-1003 GENERIC? 

77 




U: NO. ^ 

V: HOW IS CC-100 3 CATEGO^;:ZE.j? 

\ y 

At this point the Japanese answoA was UR-. That is, t.ie cate- 
gorization was in terms of the Jap^^nese category UC-"UR-". It 
is necessary to find an English catei^ory that corresponds. The 
procedure at this point is to look fiiN^t in a stored list of 
X^lingual category equivalences which w^ call interlingua . The 
entries in interlingua are of the following sort: 

Japanese ' English 

UR- SELL 
HON BOOK 



That is^ the list contains pairs of categories^ where the mem- 
bers of each pair are a.ssumed to categorize what is ^ for all 
practical purposes^ identical content • The assumption is that 
it a CC can be categorized as an instance of UC-"UR-" in Japa- 
nese It can also be categorized as an instance of UC^'SELL" in 
English^ and vice versa. Similarly^ Japanese UC-"HON" and Eng- 
lish . UC-"BOOK" are equivalent categories^ As a geneial strategy 
we expect that pairs will gradually be removed from interlingua 
as differences between the paired categories are discovered. 
Linguistic research^ has not yet progressed to the point that we 
can say with complete certainty that any two categories from two 
ditferent languages embrace exactly the same content. At the 



78 



ERIC 



outsot, howt-'Vur, it is useful at loast^co pretend that UC-"UR-" 
and UC-"SELL" arc equivalent, and probably there are at least 
some pairs in Inter linqua that will remain viable for some time 

Tho present example was chosen becau;5o the answer to the 
last quost ion .above can be ^und in interlingual Later we will 
consider a case where it cannot. At this point VAT answers its 



own question with: 
U : SELL 



/ 



/ ■ 



then looks at tho lexical entry for UC-"SELL" (which we assume 
does not Uiffer from that for UC-"UR-")« and creates the rep- 
resentation: 



VB-"Sr:LL" / ^'PAST" 

PI-DfAGT 

?PI-CtDEN 

?PI-DtMSR 

PI-EfPAT 
CJ-" ." 
CC-I002 
CJ-" 



The questions and .anjwcrs v/nich parallel linos 15-22 of tho 
JtijMne.so vorbi i 1 za t: ion arc straightforward: 



V: ir. ViiF. BKNEFICIARY EXPLICIT? 



V 



NO 



NO 



THE MEASURE EXPLICIT 



•\';IAT IS :\\v. AGENT? 



I - 1 0 0 1 



70 



V: WHAT IS THE PATIENT? 



U: PI-1003 



\ The representation, now is: 



VB-"SELL" / "PAST" 
. PI-lOOlfAGT 
PI-l003fPAT 
CJ-"." 
CC-1002 
CJ-"." 

The next exchange is: 



V: WHAT IS THE TOPIC? 



U: PI-1001 



, which creates the representation: 



VB-"SELL" / "PAST" 
PI-.TOOlfSUBJ 
PI-i003tDO 

C.I^" . " 

CC-i'00 2 

C- • II II 



Wit;; t;ic" I ox ica 1 i za t loi* of Pl-iOOi the procedure is different in 
Enqlish, since this i tyTi cannot, simply be deleted as in the Jap- 
anese.'. We loliow tiie questions illustrated in examples 40) 
through 4 3) J.n section V: 



IS PI-1001 GIVEN? 



YES 



IS PI-1001 THE ADDRESSEE? 



ERIC 



8 0 



V: WiiAT IS THE CARDINALITY OF PI -10 01? 

Vi 1 

• V: IS Pl-lOOi THE SPEAKER? . ' 

U: YES . ' • • 

T! US the representation now is: 

VB-"SKLL"" / "PAST" 
NN-"I"tSUBJ 
PI-1003tDO 

CJ-"." I» • 

CC-1002 ^ • 
CJ-"." . 

♦ 

Now comes the lexicalizat ion of the direct object, PI-lp^03. The 
.initial questions parall,el lines 27-31 of the Japanese verbal- 
ization: 

' V: IS PI-1003 GIVEN: 

U : NO 

V: DOES PI-1003 HAVE A NAMI?:? 
U : NO 

V: HOW IS PI-1003 CATEGORIZED? 

The Japanese answer was REIZOOKO. VAT will now look in inter- 
linqua to see whether that item is there, and we assume that it 
Will bo found paired with lOnglish REFRIGERATOR. Although Japa- 
nese was aLlo to terminate the verbalization of PI-2003 at this 
point, English must ask the question introduced in example 50) 
of section V: 

V:' DOES UC-"REFRIG1:RAT0R" identify PI-1003? 

B : 



The answor depends on the context, but let us assume that it is 

yes. The reprci entatioh now is: 

, - > ■ 

VB-"SK:J," / "PAST" 

NlV'I"fSUBJ I 

NN-" REFRIGERATOR" / AR-"TIlE"fDO \ 

CJ-"." ■ ■ . . > 

CC-1002 

CJ-"." 

Wo ncm^have the kind of representation of the f irsr^ontence 
that IS ^ur current goal. Normal English word order will 
put the subject first, the verb second, and the dii •. object 
.last to yield the final representation "I sold the refrigerator" 
of. 72) . . ' ■ 

* 

The above/ example was chosen to illustrate a makimally Svim- 
plo caso of translation: one in which, in particular, the an- 
swcrs to all questions^ about cross-language categorization coyld 
be found in interlingua. The interesting cases, however, are 
those in which interlir.gua does not provide all the answers. It 
is in those cases that the zigzag arrows of Figure 1 in section 
I must be further elaborated. The general method of elaboration 
is suggested in Figure 4. Assume that we are producing a ver- 
balization in the target language and, coming ^own from the up- 
per righthand corner, we arrive at ^ 'point where a CC or PI 
needs to be categorized. Following arrow 1, we look across to 
the source language verbalization to find that the corresponding 
CC or PI was categorized in a certain way, let us say as an in7 
stance of category A. W<a look next at interlincjua .arrow 2). 



82 




83 



If A wore there, wo would tako the target langi^ge category 
paired with it (such as SELL and RErRIGERATOR in the example 

above) , introduce it into the tajrget language verbaliz ation , 

* •. ■• 

and proceed. Now, hov;ever, we are considering those cases in 
which A is not found in interlingua. The next step, following 
arrow 3, is to look at the entailmonts of A in fhe source Ian- 
^uage lexicon. We ncfxt follow arrow 4 to search the target lan- 
guage lexicon for entries whose entailments are compatible with 
those of A. (This search proriedure is likely to present chal- 
lenging problems when the source language lexicon reaches any 
interesting size. It is, however, facilitated by the prespiice 
of abstract features lik6 TRANSFER and TRANSACTION which can be 
used to limit the domain of search.) Suppose that we find two 
entries in the target language lexicon, B and C, both of whose 
entailments are compatible with the entailments of A. Wo then 
look to see how the entailments of B and C differ and find, l^>t 

UH say, that— &*4:oali<aJjas_ftatAlliB.<?A^.i^^^^^^ ^ contains en- 

tdiipent(s) Y. We then follow arrow 5 back to the source lan- 
cjuaqe verbalization, hopinq to find something m it that will 
alio'j us to choose between X and Y. (Again there are chal- 
lenijinq problems in searchincj the source language text for the 
ansvict , problein:.^ that we have har(ily begim to deal with.) Lv. ' 
us now assume that v;e find something in the t>ource language text 
that is comp.-.tible with X but not witn Y. Wo are then able to 
choose D as the correct target language category ^ We introduce 
that category into the target lanc^fuage verbalization via arrow 



6 and proceed. ' In those cases whe;"e the choice between X and Y 
(and hence between B and C) canndt be made — where the source 
language text does not prov'^ide the answer — VAT must resort to 
asking the user for the correct categorization. 

We will illustrate th^s procedure with tihe brief Japanese 

text : 

» 

73) Reizooko o kasita. Okane ga hituyoo datta kara. 
refrigerator rented money needed was because 

We will want VAT to translate these two sentences into English: 

74) I rented the refrigerator. I needed the money. 

/ 

We are not concer.ied in this example v/ith the fact that the 
first English sentence is ambiguous between rented (to someone) 
and rented (from someone) , but with the fact that the first 
Japanese sentence is ambiguous between rented and lent. In both 
casoii, it seems, the .second sentence servos to d'.sambiguate. 
What we are interested in now is the fact that VAT must somehow 
choose between RENT and LEND as the proper correspondent for 
Japanese KAS-. 

We can assume that most of the verbalization in both lanr 
guages procvieds along tht; lines already exemplified, since 71) 
and 73) are minimally different. Imagine, then, c-iMt we have 
arrived at the point/in the; English verbalization where the 
question is : J 



8 3 

21 



« 1 

V: HOW IS CC-100 3 CATEGORIZED? 

We are now in the upper right of Figure A, and we follow arrow 
1 to find that the corresponding CC in the Japanese vcrbaliza- 
tion was categorized in terms of UC-"KAS-". We then /ollow 
arrow 2/ and find that KAS- is not in interlingua. We look next 
via arrow 3 at the entailments of UC-"KAS-" and find ^:hat they 
are as specified in example 55) , section VI above, but v/ithout 
the last line of that example: 

75) CC-A C> UC-"KAS-" 

CC-A F> VD-"KAS-" (PI-BtAGT, ?PI-CtBEN, PI-DtPAT) 
CC-A C> UC-TRANSFER 

PI-D = • PI-B 

PI-F = PI-C 

PI-E = PI-D 

CC-B = CC-E 

CC-C - CC-F 
CC-E C> UC-HAVE-USF 

cc-K r> uc-iiAvr:-usK 

VU-IIAVL-OWN (I'J-iJ, I'l-D) 
Substituting four digit numbers for the variables, we obtain: 

76) CC-2003 C* UC-"KAS-" >^ 
F> 

CC-2003 F> V3-"KAS-" (PI-200 It ACT , ?PI-2902f BEN , PI-2003 

tPAT) 

CC-2003 C» UC-TRANSFCR 

PI-D = PI-2001 

PI-F = PT-2902 

PI-E = PI-2003 

CC-B ^ CC-2905 

CC-C = CC-2906 

CC-290S C> rjC-HAVE-USE 

CC-2 9 06 C> i;C-HAVE-USK 
VB-HAVE-OWN {PI-2001, PIr2003) 



(Pl-290^, CC-2905, and CC-2906 have been inserted here as arbi- 



traijy numbers. It is quite possible, however, that these are 
items which 5how up explicitly elsewhere in the JiSpanese verbal- 
izatior*. For excunplc, PI-2902, the one who receives the refrig- 
erator, might well be mentioned elsewhere in the text,) 

/ 

Sinco CC-2003 involves a transfer, VAT must alr>o assign 
numbers within the definition of UC-TRANSFER, givan in secti^on 
VI above as example 52) : 

77) CC-2003 C> UC-TRANSFER 
EV 

CC-2003 S> CJ-CHANGE (CC-2905, CC-2906) 
CC-2905 F> VB-HAVE (PI-2001, PI-2003) 
CC-2906 F> VB-HAVE (PI-2902, PI-2003) 

* 

Thus there is a change from the renter or lender (PI-2001) 
having the object (PI-2003) to the rcntec or borrower (PI-2902) 
having it. The lar.t thrc'3 linos of 76) made it clear that this 
wa:5 not a change in ov^nership but only a change in use,, and that 
PI-2001 retains ownership throughout. 

% 

Following arrow 4, wo carry those entailments across to 
the English lexicon and search fior entries whose entailments 
are cxmpatiblc with 76) . Compatibility means that those entries 
will contain what is in 76) , but may also contain more. Lot us 
,-.ay tp.at. wo find two such or. rios, one for tho category UC- . 
"LEND', v/hich was (jiViMi in ^^'j) alcove, and ono for UC-"RBNT-2", 
which wa-i c}ivt;n m 57). ^ 

T'r.u next step is to isolato the difforonccs between UC- 
"m:ND" and UC-" REN7-2"*. i;c-"LF.:^D" , as mentioned, differs from 

87 

€>0 



75) in containing an additional ^^iTnal lin^: 

/ 

78) CC-A -C> UC-TRANSACTXON 

That is, CC-A cannot be categorized as a transaction. UC-"RENT 
2", on the other hand, contains the statement: 

# 

79) CC-A C> UC-TRANSACTION ^ 

j 

I 

At one level of abstraction the question which must be answered 
thorcforo, is whether CC-I063 is or is not a transaction. In- 
formally, this is a matter of whether PI-2001, the renter or 
lender, did or did not receive money in exchange for the trans- 
fer 6i use of the object. 

* The fd-llowing digits can be inserted for the variables in 
the lexical entry for 'uC-"RENT-2" : - 

i 

80) CC-1003 C> UC-"RENT-2" 
E> 

CC-1003 UC-"RENT" ( PI - 1 00 If ACT , 7PI-1901 tBEN , 

?PI-1902fMSR, PI-1003fPAT) 



CC- 


1003 


c> > 


UC-TRANSACTIi)N 




PI 


-V = 


' PT-1 noi 




PI 


-D = 


PI-1901 




p: 


-r: = 


PI-1'J02 




PI 


-G = 


PT-1003 




CC 


-D = 


CC.-1901 




cc 


-c ^ 


CC-1902 


cc- 


1901 


c> 


UC-TRANSFER 




cc 


-B = 


CC-1903 




cc 


-C = 


CC-1904 


cc- 


1902 


C> 


UC-TRANSrER 




cc 


-D = 


CC-1905 




cc 


-C = 


CC-1906 


PI- 


1002 


c» 


UC-MKDIUM-OF-EXCiiANGE 


cc- 


1903 


c* 


'JC-I'AVE-OW; 


cc- 


1904 




i;C~IIAVE-0\VN 


cc- 


1905 


c> 


UC-flAVE-USE 


cc- 


190e 


c> 


UC-ilAVE-USE 



88 

ERJC 



VB-UAVD-OVW (PI-1001, PI-1003) 



What all this says is that the categorization of CC-1003 as an 
instance of UC-"RENT-2" involves a number of things. First» 
there must be a person who does the renting but (PI-1001) , a 
person who receives tho rented object (PI-1901) , the money that 
is pai^d^n,x*9nt (ri-1902) , and the rented object itself (PI- 
1003) . Furthermore, CC-in03 is said to bo a transaction, and 
certain oquivalcnces are stated between the r<KNT-2 definition 
>^nd the THANSACTIOtl definition. VAT must therefore ass jn those 
pa^r.t"i'cUl a r PI and CC numbers within the definition of UC-TRANS- 
ACTION, which was given as example 56) in section VI a^ovo : 



81) CC-1^0 3 UC~TRANSACTION 
E> 

\ CC-1003, S> CJ-PURPOSE (CC-1901| CC-1902) 
"CC-1901\C* ^UC-TRANSFER 
PI-D \ - PI-1901 
PI-E ^ PI-1902 
PI-F - - PI-1001 
CC-1902 C* UC-TRANSFUR 
PI-F - PI-1901' 

PI-I-: - PI-1003 

PI-D - PI-1001 



Thio s^ys t.hat CC-1003 can paraphrased as two transfers, CC- 

iOOr^'and CC-1902 , tr»o : ^rst of: which was for the purpe>so of the 

/ 

socond. (CC-1901 is the transfer of nonoy , and CC-1902 the 
tnn^iter of tiic rented object.) VAT must, therefore, look also 
at tiio definition of UC-TRAN'SFKR , given in secti'on. VI above as 
example 52) , and introduce again the proper PI and CC numbers 
for t.'uch ot' thest' particular transfers. The first of thorn will 
be repri-'St-.-n tud as: 



or- 



I 

82) CC-190i C* UC-TRANSFER 
E> 

CC-1901 S> C^-CHANCE (CG-1903, CC-1904) 
, CC-1903 F> VB-HAVE (PI-1901, PI-1902) 
CC-1904 F> VB-HAVE (PI-1001, PI-1902) 



That is, the first transfer involves a change from CC-1903 to 

♦ * . ' 

CG-1904. In CC-1903 the rentes (PI-1901) has the money (PI- 
1902) , and in CC-1904 the renter (PI-1001) has it. The second 
transfer is represented as: 

< « 

83) CC-1902 C> UC-TRATJSFER \ 
E> * 

CC-1902 S> CJ-CKANGE (CC-1905, CC-1906) 

CC-i905 F* VB-HAVE (PI-1001, PI-1003)' 

.CC-1906 F* ' VB-HAVE (PI-1901, PI-1003) 



Here there is a change from CC-190 5 to CC-1906. In CC-1905 the 
renter (PI-1001) has the object to be rented (PI-1003) , and in 
CC-1906 the rentee (PI-1901) has it. ' * • 

In 80) it is also, stated that PI-1902 can be categorized as 
an instance of MEDIUM-OF-EXCHANGE, in all probability therefore • " 
an instance of UC-"MONEY" (see examplf2 59) in section VI above) . 
Furthermore it is stated that the change in the having of the . 
money (from CC-1903 to CC-1904) involves a change in ownership, 
whereas the change in the having of the rented object (from CC- 
1905 to CC-1906) involves a change in use. Finally, it is 
stated that the renter (PI-1001) retains ownership of the rented 
object throughout. 

What VAT wants to find out, then, is whether therio thincjs 
that mast be true it CC-1003 is to be an instanc*.' of UC-"RF:nt-2" 



ERIC 



are indeed true, or whether the bottom line in the entailments 
of UC-"LEND", example 78), is fulfilled, insteiad. VAT tries to 
decide this by following arrow 5 to the verbalization qf the 
Japanese text. Of course there are many ways in which the an- 
sv/er micjht appear in that verbalization, ^f it appears at all." 
If VAT is unsuccessful .in its search it wSlll have to ask the" 
user directly: 



84) V: IS CC- 100 3. CATEGORIZED AS LEND OR RENT? 



In 7 3) , however, we have made things easy by supplying a con- 
text which ought to decide the question. It will be ■ remembered 
that thje second sentence in 73) expresses CC-2002, which is the 
REASON for CC-2003, or what is expressed in thei first sentence. 
Now, CC--2002 is categorized in the Japanese as an instance of 
UC-"HITUYOO DA", which means something like "be needed". Let us 
assume that the Japanese lexicon contains an ontr^ for this 
category which includes the following: 



8 5) CC- •\ C* UC-"HITUYOO DA" 
E> 

" CC-A VB-"HITUY0O DA" (PI-BtBEN, PI-CfPAT) 

CC-A y> VB-WANT (PI-B, CC-D) 

• CC-D F* VB-HAVE (PI-B, PI-C) 



The case frame immediately under the E> identifies PI-B as the 
bt,>nef i-ciary , the person who needs something, while the thing 
needed is libeled PI-C. The second line under the E> says that 
an .i Iti'rna tive framing is possible in terms of an abstract verb 
WASN'T, wherein PI-B wants CC-D, and CC~D is then characterized in 



terms of PI-B, having PI-C. In other'WordS/ when cpe needs some- 
thing, one wants to have it. (If this is not always true, at 
lea^t it is the expected entailm.'^nt. ) . 

If 85) is going to provide an answer to 84)/, there must 
also be a ^enei-al principle of some kind which relates what is 
entailed by CC-2002 to what is entailed by CC-2003. This gen- 



eral principle can be 



stated as follows: 



86) CC-A F> VB-WANT (PI-B, CC-C) 
• CC-b F> VB-"E" (PI-BfAGT) 
CJ -REASON (CC-A, CC-D) 

■ 

Ce-D E> CC-C 



' The first line says that PI-B wants CC-C. ^ The s^ond line says 
that PI-B does something. The third line 6ays thatX his wanting 
CC-C is the reason he does something. All this'.tdgether is 
then said to entail that his doing something entails w^at he 
wants, or CC-C. In other words, if one wants something\ and does 



something because of that, then what one does must entail what 

one wants. • . • ^ 

' / ■ . ' 

During the verbalization of CC-2002 ^as pert of the verbal- 
ization of the Japanese text, VAT will ha\^,e recorded the fact 
that CC-2002 was categorized as an instanc^> of UC-"HITUYOO DA", 
and will have entered the following statements in accordance 
with 85) : ■ ; 



87) CC-2002 C> UC-"HITUYOO DA" 
E> 

CC-2002 F> VB-"HITUyOO DA" (PI-200ltBEN , PI-2902tPAT) 



92 



s 



CC-2.002 F> VB-WANT (PI-2001; CC-2;904) 
CC-2904 F> VB-HAVE (PI-2001, PI-2902) ' 

* ' ■ ■ 

At- this point; VAT -also, has all the particulars needed for. prinr 

I ■ • , - ■. • 

ciple 86), which" can be filled out as follo\ys: 

. '\ " . " * - ' . ■ 

asT'- CC-20'02 F> VB-WANT (PI-2001, CC-29D4) 
CC-2003 F> VB-"KAS-" (PI-2001tAGT) 
••CJ-REASON (CC-2002, CC-2Q03) 
E> ' . ■ . - . 

^ CC-2003. ^E>, CC-2904 

The first line of .88) was obtained from 87). The second line 
was obtained, from 76). The third line. comes from line 8 of the 
Japanese- verbal iza*ftO^ set forth at the beginning of thi"s".sec- 
tion. Wha^ we are interested in now is the last line of QQ) , 
which says in effds*/ that CC-2002^ is categorized inSsuch a way 
that CC-2904 is true, and looking\ack to 67) we see that CC- 
2904 involves PI-2001 havinig PI-2902, or the agent of kasu hav- 
ing okane 'money', j Malting the necessary correspondence^ in - 
English, this means \that CC-1003 must be categorized in such a 
way that CC-1-904 is true, where: 



\ 
\ 
\ 



89) CC-1904 F> VB-HAVE (PI-1001, PI-1902) 

I 

/ 

t . 

Thj-s is exactly what VAT finds as the last line of 82) . Since 
82) is entailed by UC-"RENT-2" but not by UC-"LEND", the cjues- 
tion in' 84) has-been answered, and the arrow labeled 6 in Figure 
4 carries back the choice of UC-"RENT-2" into the English ver- 
balization, which then proceeds as it did in the translation 
illustrated earlier. 

93 



^ By this complex proce^|[;^iry/olving comparisons of entail- 
ments within and across lAngif^ges, as well as the general prin- 
ciiple stated ..in .86) , . VAT has been able to mak,e^he corr jct 
choice. So Ipng as the answer to 84). was derivable from some- 
thing discoverable within the Japanese verbalization, VAT could 
in principle succeed. It is .clear, however., that the route to 
the answer could be. extremely complex, involving chains' of en- 
tailments of unforeseeable length. There .is no doubt that such 
procedures are necessary tP answer such questions, and that they 
present an extraordinary challeiige to our techniques for infor- 



mation storage ar^-.s.earch. 



\ 



IX. Miscellaneous ^r-obiems inTWanslation' 



■ ■ ■ ■ / 

• : /■ ^ ' 

Since wo havo spent considarablc time lookj^nq into various 

specific translation problems beyond those illustrated above.. 

we present here a few additional examples^of the sorts olf things 

■ ■ ■ . " ' ■ " ^ 

that will have to be taken into account during the implementa- 
,tion of machine translation along the lines suggested above. 

* 

Two off these examples .will , like those in t-he last section, in" . 

volv<:> the choice of a category in the target language whea thaL 

choice 'is not irectly provided by interlingua. One has tt>. do . 

with the translation .^f Japanese osieru into English; the 
/' ' ' ■ 

« /other, the translation of English give into Japanese. A third - 

example will illustrate- the kind of problem that arises at the 

stage o3 subconcep\sj^alization and sentence formation. 

The following three sentences- illustrate three possible 
English translations of the Japanese verb osieru ; 

^ 90) ' Gaido wa Kookyo ga doko ni aru ka osiete kuremasita. 

, guide Imperial Palace where is showed 

' soke kara tookyoo ttawaa o ikimasita. 
there from Tokyo tower to went 

* 

The guide showed us where the Imperial Palace was. 
From there we went to the Tokyo Tower. 




91) Gaido wa Kookyo ga doko no aru ka osiete kuremasita 

guide Imperial Palace where is told 



95 



101 



ga -wat.=^si.tati ga sokb e itta toki ni moo simatte 
but we there to went when - already closed 

. imasiua. . . ' , ' ' ' ,i' 

" was . . ■ • • ' . 

The gui4:e told us 'where the Imperial Palace was, hut when 

we got there it was already closed. . 

■\ • ■ ■ \ 

92) Kimatu .sijcen no tame ni sense i wa • ' 
semester-final ekam pi. for the purpose teacher 

Kookyo ga doko ni aru ka osiete kudcisaimasita. 

Imperial Palace where is taught 

For the final jxam the teacher taught; us; where- the Imperial 
Palace was. • 

Jlit^^_ o.t,_t.he.ac. .. j2x.ampl.e S-.-CpntAina. -tb^^lLraaje:.. _ * , . 

93) Kookyp ga doko ni aru ka osiete 

which is trarsla-fe§> in three different ways, determined by the 
context: 



in 90) 
in 91) 
in 92) 



show where the Imperial' Palace is ' 
tell where the Imperial P.:.5.ace is 
teach where the Imperial Palace is: 



The difference is localized in the translation of-' osiete , a 
participial form of the verb osie ru. This verb may be tir^n^r- 
lated into English as show . tell , or teach according to the 
context, and the problem is to identify what the determining 
factors are. 

96 

ICS 



' The Japanese catet^ory UC-"OSIE-" a? well as the English 
' '■ ' -'it 
categories 'UC-!' SHOW" ., OC-'-'T^LL" , 'and UC-"TEACH" are' all -included 

• j ■ * • . 

within the more ah s^^eskf^-^as^etuqory UC-COMMUNICATION, whi^;h can be 

defined as follows: . 

94) CC-A ■ I'C-COMMUNICATION, 

E>- ... - • - 

CC-A F>. 'VB-INTEND (PI~B, GC-C) 
. CC-C S> CJ-CAUSE (CC-rD, CC-E) 

CC-D F> VB-ACT (PI-B) . ' 

CCrE S> CJ-CHANGE (CC-F, -CC-G) 

CC-F F> -V3-KN0W (PI-H, \CC-I) ' , • 

CC-G F> VB-KNOW (PI-H, CC-I) 

< 

That is, for a CC to be categorized as an instahce of UC-COMMU~ 
ICATION entails that someone (Pl-B|) intends something (CC-C)', 
and th^t what he intends is that CC-D will' cause CC-E. CC-D .is 
some i^t__th^ Pt-B performs, and CC-E, caused by that act, is a 
change from state CC-F to state CC-G. CC-F is % state in which 
another person (PI-H) does not Know something'. (CC-I) , and vCC-G 
is a state in which that person does Xnow it. 

■ -A 

Subcategories of UC- COMMUNICATION may differ a^ to the 

o 

I 

nature of the act (CC-D) p^r formed by the communicator, as to 
the kind of knowing that, results (e.g. whether it is retained 
in surface or deep mer^iory) , and in other ways such as the au- 
thoritativeness of the communicator with respect to what is 
communicated (CC-I). The Japanese category UC-"OSIE-" , for 
example, is less specific as to the act performed by the commu- 
icator; apparently he can r^o almost anything that will have a 
communicative function. UC-"TELL" , on the other hand , • entaij^ a 

97 

103 



ERIC 



» verbal act; UC-"SHOW" an act which directs" the other persons ' 

vis-ial aLtontion to CC-I , arfd UC-"TEACH" an act which is didac- 

- ■ / ■ . ■ 

tic; in nature. - It is difficult to delimit the acts wh'.ch qual- 

ify as teaching / but evidently they ^i);H*sii^have ^n instructional 
quality Which is not n.ecessary for^UC-"OSIE-" . UC-"TEACH" may 
also be uniqiu/ in requiring that uhe knowing (CC-G) be d'?ep or 

loriq-tenTv knowing, at least in %\ie intention of PI-B. Japanese 

• . • f- ■ ■ ' 

UC-"0SIE-" may, for Its ->art, require that- PI-B be authoritative 

with respect lO content of what is being communicated (CC-I) . 

.4 ' . ' 

- • Bil<t how is it, for example, that the context in 90)' re- . 

« \ - - 

stricts the' translatipn of "OSIE-" to "SHOW"? The secqpd sen- 
tence in 90) says that we went ^rom there (soko) , whose referent 
IS the legation of the Imperial Pcilace!. Thus, at tjie time of 
the communicativo event, we miist have been dt the Imperial Pal- 
'•aco. Now, there is evidemay a general principle, like 86) in 
the last section, which says that a verbal act is not used to 
communicato ^wlicre something is when the beneficiary of the act 
already at t,.iat place. There is evidently no such restric- 
tion on directing vi;>ual attention to Where it is, hence LiC- 
"SHOW" is preferred to UC-"TELL" . Since there is nothing in the 
context of 90) to suggest that teaching methods were involved, 

t 

yC-"SHOW" is left as the only candidate. 

In 91) the situation is otherwiscr The second clause makes 
it clear through the phrase translated "when we got there" that 
we were net at the Imperial Palace at the time of the commu-ni- 

98 

IGl 



ERIC 



cativ*.' act. Another gt^neral prancip;!© says that visual atton- 
f^ioii can l>e directed only at thing?- within viAmal range. Thus 
ue-- SHOVx*' is in this case ruled "out, as is UC-"TEACH" again be- 
cause or the absence of didacticr cprtext. UC-^TP.f^h" is thus the 
choice here. ' • * . y^-s. 

In 92) the didactic context is evident. The. Japanese wozps 
kimatiw £ijien» and sense i all belong within, the semantic field ^ 
of teaching, a f-iot'to be ; noted in 'the lexical entry for each of 
th;ni. Hence the English category UC-"TEACH" /. gbvi isly c» mcnner 
of ; hci same semantic f^ieljd, .will be the choice here. Probably 
wo should also take acc/fHint of the fact* that the idiomatic verb- 
at tho <vnd of this sente-ncc; literally 'gave', reinforces the 
.--.upt. rior iVlutionship .01 the conimunicator : in this case, the 
f.-ict that ho is autfhor itative with respoc c to what is being com- 
;nanicatdd. 

Tao ivoii'it of this oxampljLL of the translation of > osieru is 
to ompha.-si/cj the coinpl t-^x 1 ty of tho criteria which may have to be 
invoked to docdo betwoon possible'; translations. Here we have 
soon a link botvA-on di ftorent kinds of communicative acts and 
the location of tho recipient of thc^^^^mmunication , information, 
on the latter boinq derivable from information about the move- 
ment of the recipient to or from the place of communication, to- 
gether wich temporal information. It is also of 'nterest that 

m 

this example, like the second example in section VIII, led us to 
recognize certax.n general principles: that one does not com- 

09 

105 



municato vupirjaily about where something is when the addressee is 
already thoro, for ex.imple, and the ob^•ious principle that pnc 
does not call visual attention to &omethin»j that is nc L visible. 

Dotailad impleiwenvation of this kind of translation research 

\ .... 

will undoMbtedly lead to the recognition of a number of such 
.principles;. * 

The v/ord kiulana im.\s i ta in 92) UvidM us to .i different ki\nd 
of complication, that involved in the need to pay special attln~ 
tion in Japanese verbalization to the social relotiortship ej;- 
istiruj. between -the speaker and variouS other persons. Althouvjh 

> . . . 

we are changing the direction of translation here, it is of some 
interest to consider questions that arise in translating the 
tinglfsh category "UC-"GIVE" into Japanese. We may assume- that 
UC-"Gl'/r:" has the entailments listed in example 54) , section VI 
above, and that turtle rmoro the catogorieJ:- underlying all the. 
Japant?.5c» verb.s' to bo mentioned share those same entailments. 
Kach Japanese category, however, has additional entailments of/ 
ii.s own, and. i.t is t;\o n.ituro of those addition:.! entailments 
that w.o are intercstoa la. 

The verb karer a , for example, is used to express instances 
of a category whoso entailments include those of UC-"GIVK" plus 
the following (wi;ero PI-B is the agent and Pl-C the beneficiary 
of the giving) : 

95) CC-A •JC-"KUR1>" 



106 



100 



• vu-cbosk.-to-speaker (pi-c) ' ' 

■ vb-closer-to-i^pnake'r-than (pi-c, pi-b) 
-\^-higiTer-than (pi-b, pi-.c) ' ■ ' 

That is, UC-"KURE-"^is the category, chosen if the ber.oficiary 
of t'he giving is social ly . clps.e to the. speaker, closer to the 
speaker than the agent of the giving, ^nd the agent is not so- 
cially higher than the beneficiary. In ,trans la ting ■ texts where 
such information is relevant, VAT will either have to store a • 

v., 

network of social relations , linking all the relevant individ-- 
uals, a network which may in part be derivable from the text, or 
it wil^Oiave to ask the user questions like: 

96) V: IS^ PI-2849 SOCIALLY CLOSE to PI-2001? 



V: IS PI-2849 SOCIALLY CLOSER TO/ 



PI-2001 THAN PI-2365? 



V: IS PI-2365 SOCIALLY HIGHER THAN PI-2849? 

The verb kudasaru ,. whose idiomatic function appeared in. 
92) , is used to express instances of a category whose entail- 
ments are as follows: 

97) ' cc-A c> uc-"kl;dasar-" 

. . > ' 

vd-close-to-spkakkr (pt-c) 
vb-closer-t,o-spi;aker-tiia>| (pi-c, PI-B) 

VB-HIGHER-TUAN (Pl-B, PI-C) 

In other words, the entailments of UC/^KUDASAR-" are the same 
as those of UC-"Kb'RE-" except that the agent of the giving is^ 
socially higher than the beneficiary. (It was the exalted po- 
sition of sense i , the teacher, in 92) that led to the use of 




101 



J 1G7 



* / • ■ 

kud asaimasi ta ii^i' that sentence •) 



•Another pos.«iibiJ^ity.>ir^ th^ verb yaru ; , " 

I 

98) CC-A C> UC-"yAR-" ' / . ' 

E> „ ' • ■ 

« • • 

/-VB-CI.OSE-TO-SPRAKKR (PI-C) 1 ' n 

L-VB-CLOSER-TO-SPEAKER-THAN (PI-C, PI-B)J • . 

VB-lilGHER-THAN (PI-B, PI-C) 

-VB-RESPECTED (^PI-C) _ • ' - ' 

y ~ • ^ . ' . • • : ■ 

■ ■ . . 1 

Tm uraces indicate, a disjunction. Thus of the, ways in 

« 

last two .is in tlSb bcne- 



wh|ch this category differs from the 

ficiairy of the qlvinq not being socially close to the. speaker, 

•* ... ^ - . 

or iQlse in his not being, closer to th^ speaker than "the agent of 
tid giving. As In' 97) the agent is socially higher^ii^ t^e ^ ... 
bbheficiary. "furthermore, as "stated Tn "the . llist lij^e, thejbene- 
ficiary is not being treated respectfully by theyspeaker 

The verb aqoru is lik^ - yaru , except that the agent of the 
giving is not socially higher than the beneficiary: . 

99) CC-A C> UC-"AGr>-"V. . •■ 
E> 



• « > 

/-VB-CLOSK-TO-HPEAKKK (PI-C) ) 
l-VB-CL0SER-TO-SPF,AKF,R-THAN (PI-C, P1-B)J 
-VB-HIGHER-THAN (PI-B, PI-C) 
-VB-RESPECTED (PI-C) 



The la.it verb that wo will consider hero is sasiagoru 

IOC) CC-A C> UC-"SASIAGE-" 
E> 

• • • 

VB-CLOSE-TO-i^PEAKER (PI-B) 

102 



10 



ERIC 



r 



VB-llICHKR-TilAN (PI-C. PI- D) 
VB-RKSPKCTED (PI-C) 



In erthcr words the aqont.of. tlio gj.vinq is socially c'l-!.s.e j^o the 
spoakof, while t lie 'bene fici'apy is sociaUy hiyhor "than the agent 
aRd\ is beinq throated / ospeccfully by tho Gpeakor. It i«-^o ^ 
poisibln to vtso this ^^atci^ory when the aqontr in not socially 
'cii^se to tho spedkU-r, bi t <?vidontly Japanese sp-^kers ire not 
Vornpiet'.-lv <capf orV-i-ble about the choice in that ca^e ; / i.over- . 
tti*' I%><..s, ther/!- is 'neb othor caH.Gciory 'availAblo^ ^ / ; » J. 



/ 



r,no way m' which VAT.miqht'. bo abl.e to f in4_4Aswers to ques- 
jjf . . ■ k;^ ' • . • / / 

tionS rcqardiqg social relat ionshl ^ps is throuqh th^/ occurrence 

■'*-'■• " >• ° 

m tne text of catoqorizations ' that entail such j^elationships. . 

For oxainpie, the occurrence of an instance of UC- SENSEI,- in ^ 

example 92) entails a socially hiqher status for the PI thus 

cateqorized than for the Pis who are this teacher's students. 

It tnas leads to the choice of UC-"KUDASAR-" . Kinship terms 

also orovLdo examt)le.^; or' automat ical Iv ontai led social status. 

.f we tak.- a PI Li.at in m instance of uC-"OTOOSAI>J" 'fatherV 

for oxamplu. tiu-ic- ar.. -n ta i. l.T.cnts of tho following sort: 

iOl) PT-A C* (;C-"OT00SAN" 

VD-FATilL-R-^U-' (PI-A, PI-B) 
VB-UIGiiER-TllAN (PI-A, PI-B) 

That IS, PI-A mu:it bo the fati^or of someone (PI-'B) , and will be 
socially higher than that someone. It will also be the case 

i 

that: 




103 



102') vd-father-0? (pi-a, pi-b) ' 

sp-spe,Aker« (pi-a) , ■ ' - 

VD-CLdSE-TO-SPEAKER (PIrB) ■ , 

» ; ■ - ' ' •'./. 

That/.iSs«^|.f the PI-A,who is, -the. f atlje-f of PI-B is at th^ saroe 
time the sj>qaker, PI-E( will be socially close to the speaker, s 
The entai l/nents.derived from'both 101) and 102) are relevant to 

i..,' •■ -y ■ ■ ■ ■ ■ ♦ 

.the choice of a translatijDrt* for English tJC-"GIVE" , as sketched 

* • 7 ■■ i "~ . 

above-. ■ \ , • • " . - 

So far all our examples of ' translatio4i%robl,iems have in- 
■ ^ ... ■ ■ ' ■ . / 

volved categorization. Certainly, however , ther'^6., are alsciJpiJ'ob- 

■ ' ' - ■ . , 

lems which arise in subconceptualization, and la the ^ssoci^ted 

application of syntactic processes which lead to clause and sen- 

» ^ * • •. 

tence formation. We have not p^id *as itiuch attention .to- ques- - 

\ ■ . . . - ■ . - 

tions of this sort.- since for the most payt vWe hav^ been able to 

■ \ / ' ■ ■■ \ . ■ ■ 

translate Sentence for sentence -witli' reasonable succfe^ . One 
example- which seerits- fairly cle^r arose 6arly in our\lnv^,^tiga- 
tion, and will be repeated here as an illustration of the^j^ial- 
lenges which are likely to arise Vri this respect. • ^ 

■ ■ . i ■ 

At issue is the translation o'f the English sentence in 103) , 
the first sentence of a fable, into the sequence of two Japa- 

« 

nese sentences in 104) : .. , . * 

103) There was^ once a wolf who saw a lamb drinking at a river / 
and wanted to create an excuse to eat it. 



104) Mukasi aru / tokoro ni/kawa de mizu o nonde 
104) wuKasi ^^^^^.^ p^^^^ injrl^ver at water drinking. ^ 

iru ko-hituM o mituketi ippiki no pokami imasita. 
be lamb saw one wolf • i was 

. Sosite sono ookam'i w^ sono ko-hituzi o taberu tame no 
ana 'that wolf •' . , that Idmb eat for 

iiwake. o tukuri-ta-gatte imasita. ,;| / • 

exc/ise make-want-seeming was 

-The question we ardy^oncerned with is why .it is desirable for 
the Japanese translation to create two sentences, where the Eng- 
lish had only c$ne. ■ ., ■ , , 

Wc may note first of all that the English sentence .contains 
two conjoined relative clauses ("who saw.:. and wanted..."). 
.Japanese relative clauses differ syntactically from those in 
English in being preposed W the.nc^^n they modify. Uence, if 
the Japanese were to p'resexve the . structure of the English in a 
single sentence, the speaker would have -to s^ay everything that 
the wolf saw and wanted before he eyer was able to mention the. 
wolf. .The subject of the seeing and the wan ting., w^uld be held 
in suspense for so' long that addressee or reader might have buine 
problem in interRretxnq what was being said. Another reason for 
not repeating the English structure /.f two relative clauses has 
to do with the beginning of the next sentence: in English, "For 
that purp-bse...he accused the lamb of stirring up the water../' 
The referent of that purpose in English is clear. It refers to 
the "immediately preceding relative clause; "wanted to create 
an excuse to eat it." His wanting to. create this excuse was his 

105 

111 



1 • 

• \ 



purpose Cor accusing the Xamb. In' Japanese, however, if the 
clause in question were Weposod to ookami (which would then b 



\ 
.1 




.follow(}d hy the main' ve rip of tho sontonc.e\ imafUta. ) , ..he rofcr- 
• Mii. ()l 'Jiii.^ wouWl IK) l\on<)(>r Ix* cltMt.. Uy makirm f h(» 

clause about the wolf's wanting to create tho excuse into an in- 
dependent sentence, the Japanese is able to refer to it directly 
at the beginning of the next sent\^nce without difficulty. 

» 

V/e have not yet formalized the processes hy-s^hich ^VAT would 
decide to create two sentences in the trans^^cition Where tl 

♦ source verbalization has one. Evidently principles such as the 
follow-ing -must eventually be built into VAT. Firsts there must 
•be a restriction tof some„kindlon the anrount of material that can 
be include'd^in a preposed rg^tive clause, and perhaps especial- 

ly in a relative clause that introduces the main character of a ' 

/p'' :• 

story (who^e introduction cannot be put off for too lon^) , Sec- 

• ond, there is a need for a sentence-introduct&ry phrase like for 
that purpose to have a clear referent which immediately precedes 

*it. The task of introducing such principles into VAT's opera- 

■* 1 . '''''' 

tions is formidable, but not impossible of, accomplishment. 



ERIC 



106 



V 



. - X. Future Wdrk . 



It will t}0 obvious to anyone v/ho has tried to deal with the 
sorts of infonnai:ion and processes mentioned in this report that 
wc have only inserted a fev; pin pricks into a gigantic monster 
whose eventual conquest calls for years of patient work. With- 
out pretending* to cover everything, that needs to be done, we 
summarize below some of the more obvious lines of research that 
the report suggests. 

(1) During subconceptua'iization. we make use of statements 
like CJ-YXELDS (CC-1002, CC-100.3) . We need to extend and clar- 
ify the set of relations to which CJ-YIELDS .belongs: 'the r^ 
lations which exist between the various conceptual chunks of a 
text, whether these chunks bo large or small. 

^ 

(2) Such relations have surface consequences, of the kind 
illustrated in examples 8), 9), 19), and 22) above. Such conse- 
quences axe in Tact qulce varied and subtle, being dependent in 
part on complex contextual consicfeHations. Their clarification/ 
calls for extensive textual analy^i^. / 

■ ■ ' ■ - .'X/ 

(3) , We now have a primitive device for introducing di- 



J 



gressions and parentheses into the suJ^nceptual hierarchy. ^We 
need to look at digressions in greater detail to determine mbre 
precisely what constitutes a digression, how best to formalize 
the processes by which digressions are introduced, and how they 



107 



ErIc 1^3 



are expressed under various conditions... 

■• „ .. • ? 

(4) During the initial period of this project we spent 
some time investicjating the manner in which various textual 
genres constrain the processes of verbalization. It will be 
necessary to return to this question with a view toward con- 
structing conceptual "grammars" of scientific articles, news 
reports,' stories of various kinds, etc. 



(5) Although the general nature of the framing process 

seems to be understood, its det^ails need refinement. The best 

invontoj^of "case" relations has not yet been establi-^hed, nor 

ha«i tlie interaction between cases and other statuses which PTs 

may have, such as tjjpic or given information. 

•i • ■ ,, . 

(6) The treat^^ent of "modifiers" (adjectives, .relative 

clauses, adverbs) iS; presently oversimplified. More work on 
their introduction ar^d expression is called for. 

'"1 

(7) The treatment of "inflections" (tense, aspect, arti- ■ 
cles, number, and the'^ike), though it has been given a fair, 
degree of attention ali^eady, needs to be expanded and extended. 

(8) At present, i"^ a PI has a proper name we treat it as 
a unique name and use it\for the lexicalizat ion of the PI when- 



over the latter is not "c*|^^ven". This procedure ignores the many 

interesting constraints v/Uich govern the choice ai^ong competing 

proper names for the same ^I. Investigation of this area is 

■ 6 \ 

108 



.♦dependent on a more detailed understand inq of a variety of 
int e rp o roon a l r e lationships . : ., < . . 




(9) When a PI does not have a proper name and not pro- 
nominal ized, it nust he categorized in||ome way that will lead 
to lexical izat ion in terms of a common noun. The factors which 
influence such categorization are of basic psychological inter- 
est, involvii^ such questions as whether conceptual ""features" 
are adequate to account for how a particular PI is caltegorized, 
and the exte\it to which continuour, degrees of codability must be 
recoynized. ^ese factors will have to be included eventually 
in the lexicon, ^so. that this problem is really the problem of 

how the lexicon should be developed. 

• . • ^ . . • • 

(10) ' A practical problem Involves the piocfiduras by v;hich 
lexical entailments are utilized. Should all the entailments 
asspciated with every item in a text be. specif ically created by 

. the progrcim, or should they somehow be held" in some latent con- 
dition until they arc needed? It is imporjtant to avoid the 
mushroomijjg of entailments beyond necessity.; hut exactly how it 
can be avoided is not yet clear. 

(.11) At present, if a PI is Vgiven" it is automatically 
pronominalizod. We know that pronominaliza.tioh is influenced by 
other factors; for example, it will often, not take place if 
\ ambiguity is likely to result. Such factors as a search for 
p6^3ible ambiguity will have to be introduced into the system. 

\ • . ■ 

109 

1^5 



ERIC 



(12) Among the ."readjustments" (sectioxT-Vx];^) >bich are ~ 

.applied a^-t^r a sentence has, been produced, we have dealt with 

* . • • • I ■ * ■ . 

threq types: the introduction of givenness, the intV.duction of 

the relation IDENTIPIEF iWwoon a. caLcqory and a PI, and tho • 

cr«a.t ion of a CC which irepyosents the production of the sentence. 

as an event in itself, other kinds of readjiistmentfe— that is, 

changes in discourse infomlation which result from the produc- \ 

tion of a sentence — need to be investigated. .'v 



ERIC 



(13) Our surface structure format at present consists of 
a series of statements representing the major components of a 
sentence, with all necessary surface information included. We' 
have designed such structures so that they can be algorithmically 
converted into sequences of words and sentences — that is, into 

a normally readable text. y*rhe algoritlim needs to be specifically 
implemented. 

(14) With reference to the tianslation component in par- 
ticular, it will be necessary, to look into differences in the 
way different languages r.ubconceptualize various kinds of con- 
tent, differences -in the treatment of various rtenres, differ- 
ences in the placement of sentence boundaries, knd so on. 

\ 

(15) The construction of an extensive interlingua list (a 
dictionary of direct category correspondctrces) between pairs of 
languages is called for. 

(16) . Procedures involved in the use of entailments for the 



110 



discovery of indirect category correspondences between two lan- 
guacon muat be refinod. Aside from the n^a^ j-to build up a lar 7 
and dilod working lexicon fen:, each- ■l.*nf*uaiTe^^**Ji t .is 09^1 in 
mind, it will be necessary to find ways of optimizing^-the search 
for correspondirg entailments when no direct corrfesp*^ ndenq^^Ln 
inter lingua is found. 

(1»7) Wic.h respect to parsing, wo need to make orocise the 
leoi.hiqucs by which textual clues are utilized in the recon- 
struction of the verbalization processes by which the text wns 
created. These clu^s are many and, vari'-'d, differing to some 
extent from language to language, and again a large-scale om.pir- 
ical investigation is called for. 

Attacks on these problems are appropri^ite on at least four 
fronts. First, investigators should undoubtedly continue to en- 
gage in traditional "armchair" linguistics, involving cogitation 
and discussions by persons steeped in the languages, procijduies, 
and theoretical issues involved. Second, one can adapt and ex- 
ploit whatever materials relevant to these questions can be 
found in the literature on these languages, on linguistics, on 
artificial intelligence, on text structure, and so on. Third, 
it will be possible to develop new facts through experimental 
work. As an example, one can investigate specific examples of 
human translation in order to establish ranges of variation in 
different verbalizations of what is essentially the same con- 
tent, and to determine the optimum correspondences between 

111 

117 



two languages in specific cases. Finally, pi courais the dovel- 
opraent of a comput or system can proceed in paral «,el wii h the.. c 
other lines of research, handling the ever- inert asinc comcleydty 
in a way that the computer i& uniquely aaited for, and pxoviding 
indirtpensible testing ground for each new feature or process 
that -is hypothesized. 



• 



112 

118 



Footnotes 



'Roger W. arown and F.rlc K. Lenneli^erg, "A Sf:udy in Language 



454*462 (1954). 
2 















•t ; 





Wallace L. Chafe # "Language and Consciousness i" Language 



50x111-133 (1974). 



^Cf • the discussion of **de3p'' miemory ir. Chafe ir **Language 

1 

and Memory," Language 49; 261*281 (1973). 

A 

Khat follows is based on the anal^^sis by Susiuno Kuno in 
his T he Structure of the Japanese Language (MIT Press » 1973) » 
chapter 9 . 




113 

1/9 



UNCUSSIFIED 



igeUftiTr-et-l»»»M'4C/.IJflftO^T^'S PAGE fWhtn Out* Fntrtfd) 



REPOi^T B0CUMENtATl0?T1>A6€ 

HA5C-k-7A-271 



i 30VT ACCESSION NO. 



1 



READ INSTRUCTIONS 
liEFORE COMPLETING FORM 
3. NEClPieNT^'S CATiatni ttUMSCA 



4 r%Xl,ti(Bnd AuhUile} 



AN APPROACH TO VEaB/itlZATION AND TRAN5UTI0N 
BY MACHINE ^ 



7 AuTHORr#; 

Dr^ Wallace L. Chafe 



^1. PE^POfyMINO OROAN12ATION NAMC AND ADDRESS 

Hie Unlver8i;:y of California at Berkeley 
Deprirtment o£ Linguistics 
Berkel^yt California 94720 ^ 



irHMONlTomNO AOENCY NAME ft AOORESSrif fIJflereitf Irom ContfolUn$ OUU9} 

Same 



11. CONTROULiNC CE NAME A>*0 AO* fteSS 

Hor^e Air Development Center (IRDT) 
Criff l88 Air Police Base* New York 134A1 



S. t\'Pt or REPORT a PERIOD COVfiREO 

rinal Report 

1 Jim 72 - 31 May 74 



s. pcmro^MiNO opo. report number 
'None 



t^' contractor grant NUMBfiR^*; 

P30602-72-C-0406 



to. PROGRAM EUEMPN A PROJECT, TASK 
AREA A WORK UNIT NUMBERS 

Ptograo Element 62702F 
Job Order No. 45940805 



ta. REPORT «At6 

October 1#74 

tS. NUMBER or PAGES 

-114 



tS. teCURITV CLASS, rol #«porU 

UNCLASSIFIED 



lijr'OECL ASSI r t C ATiON'OOWNGRAOlNG 

schZoule 



IS OlSTRlBL riON ST A^^MENT iol thU titpott} 

Approved for public release; distribution unlimited* 



N/A 



17. oiSTRiouTiOfTirTJI^iNt roi fh# 



«cf entmfdin Block 20, il iiilUfnt *fotn H^pott) 



Same 



19 SUP*>L£M£nTARV NO^iiS 



None 



by biock numl^) 



^9 Kfiy WORDS Cf^nttnu^ on r»v»fi« 9**10 if n^c^aamtf t40nUly 

Computational Linguistics 
Machine Translation 
Lexicography 
Computer\Programming 

A rtificial Intelligence 

2 0'*SS35T"RTcT'''rofT I /nu# *w» t0*09$0 »uM 11 n»ctf0mry «nrf irfenfilv block nttmbtf) 

The report documents performance on a 24 month R&D effort oriented toward 
the development of a computerized model for machine frar^slatlon of natural 
languages* The model is built around a set of procedures called 
verbalizationi intended to simulate the ptocesses employed by a speaker or 
writer In turning stored Information into words. Verbalisation Is seen to 
consist of subconceptuallzatlon and lexlcallzatlon processes which Involve 
creative choices on the part of^e verbaliseri together with algorithmic 



OD I jAH*7t ^473 COITION or I NOV S$ IS 0B90L 

no 



UNCLASSIFIED 



SeCURirv Ct ASJIUCATioM of TMI» PAOE {Wh»n butm Bntafd) 



^ UNCUSSIFIED > * ' 



1 



20« Abstract (Cont'd) 

syntactic prbcitssas determined by the language being used* Translation Is 
viewed as <1) thc< reconstruction of the verbalisation processes which went 
into the original source langvage text and (2) the application of parallel 
verbslisation processes in the target language* The target language 
verbali2ation>^isok8 for creative choices to the source language^verbalisatlon 
and tries to app^y corresponding choices 'simultaneously with ispplication .of 
''syntactic processes dictated by the grammar of the target language* 
Verbalization and translation processes are illustrated In some detail with 
examples tak^ from English and Jiipanese* Some of these processes have been 
implemented in an interactive pros^ram on CDC 6600 at the Lawrence Berkeley 
Laboratcry (AEC)t but the ma^n intent of the report is to demonstrate the 
kinds of processes that need to be incorporated in such a system* 



r 





UNCLASSIFIED 




1 




MISSION 

of ' 

/2()me i4ir Devebpment Center 

0 

liADC is the principal AFSC organization charged^ with 
planning and executing the USAF exploratory and adiranced 
davalopmcnt progran\sfor electromagnetic intelligence 
tectmiques, reliabiri^ and cdmpatibility techniques for 
elf^tfonic systems, electromagnetic transmission and 
rnccption, ground based surveillance, ground 
communications, information displays arid information 
processing. This Center provides techtiical or 
management assistance in support ot stJdies, analyses, 
development planning activities , ^acquisition, test, 
evaluation, modification, and operation of aerospace 
systems and related equipment, 

** 

Source AFSCR 23-50, 11 May 70 _ 



